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Preface to the Second Edition 


We were very pleased with the response to the first edition of this book and we were 
very happy to do a second edition. In this second edition, we cleaned up various 
typos pointed out by readers and added some new material suggested by them. We 
have also included important new results that have appeared since the first edition 
came out. These results include results on the gaps between primes and the twin 
primes conjecture. 

We have added a new chapter, Chapter 7, on p-adic numbers, p-adic arithmetic, 
and the use of Hensel’s Lemma. This can be included in a year-long course. 

We have extended the material on elliptic curves in Chapter 5 on primality 
testing. 

We have added material in Chapter 4 on multiple-valued zeta functions. 

As before, we would like to thank the many people who read or used the first 
edition and made suggestions. We would also especially like to thank Anja 
Moldenhauer and Anja Rosenberger who helped tremendously with editing and 
LATEX and made some invaluable suggestions about the contents. 


Fairfield, USA Benjamin Fine 
Hamburg, Germany Gerhard Rosenberger 
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Number theory is fascinating. Results about numbers often appear magical, both in 
their statements and in the elegance of their proofs. Nowhere is this more evident 
than in results about the set of prime numbers. The Prime Number Theorem, which 
gives the asymptotic density of the prime numbers, is often cited as the most 
surprising result in all of mathematics. It certainly is the result which is hardest to 
justify intuitively. 

The prime numbers form the cornerstone of the theory of numbers. Many, if not 
most, results in number theory proceed by considering the case of primes and then 
pasting the result together for all integers by using the Fundamental Theorem of 
Arithmetic. The purpose of this book is to give an introduction and overview of 
number theory based on the central theme of the sequence of primes. The richness 
of this somewhat unique approach becomes clear once one realizes how much 
number theory and mathematics in general is needed to learn and truly understand 
the prime numbers. The approach provides a solid background in the standard 
material as well as presenting an overview of the whole discipline. All the essential 
topics are covered the fundamental theorem of arithmetic, theory of congruences, 
quadratic reciprocity, arithmetic functions, and the distribution of primes. In 
addition, there are firm introductions to analytic number theory, primality testing 
and cryptography, and algebraic number theory, as well as many interesting side 
topics. Full treatments and proofs are given to both Dirichlet’s Theorem and the 
Prime Number Theorem. There is a complete explanation of the new AKS algo- 
rithm that shows that primality testing is of polynomial time. In algebraic number 
theory, there is a complete presentation of primes and prime factorizations in 
algebraic number fields. 

The book grew out of notes from several courses given for advanced under- 
graduates in the United States and for teachers in Germany. The material on the 
Prime Number Theorem grew out of seminars also given both at the University of 
Dortmund and at Fairfield University. The intended audience is upper level 
undergraduates and beginning graduate students. The notes upon which the book 
was based were used effectively in such courses in both the United States and 
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Germany. The prerequisites are a knowledge of Calculus and Multivariable 
Calculus and some Linear Algebra. The necessary ideas from Abstract Algebra and 
Complex Analysis are introduced in the book. There are many interesting exercises 
ranging from simple to quite difficult. Solutions and hints are provided to selected 
exercises. We have written the book in what we feel is a user-friendly style with 
many discussions of the history of various topics. It is our opinion that it is also 
ideal for self-study. 

There are two basic facts concerning the sequence of primes that are focused on 
in this book and from which much of the theory of numbers is introduced. The first 
fact is that there are infinitely many primes. This fact was of course known since at 
least the time of Euclid. However, there are a great many proofs of this result not 
related to Euclid’s original proof. By considering and presenting many of these 
proofs, a wide area of modern number theory is covered. This includes the fact that 
the primes are numerous enough so that there are infinitely many in any arithmetic 
progression an +b with a,b relatively prime (Dirichlet’s Theorem). The proof of 
Dirichlet’s Theorem allows us to first introduce analytic methods. 

In distinction to there being infinitely many primes, the density of primes thins 
out. We first encounter this in the startling (but easily proved) result that there are 
arbitrarily large gaps in the sequence of primes. The exact nature of how the 
sequence of primes thins out is formalized in the Prime Number Theorem, which as 
already mentioned, many people consider the most surprising result in mathematics. 
Presenting the proof and the ideas surrounding the proof of the Prime Number 
Theorem allows us to introduce and discuss a large portion of analytic number 
theory. 

Algebraic Number Theory arose originally as an attempt to extend unique fac- 
torization to algebraic number rings. We use the approach of looking at primes and 
prime factorizations to present a fairy comprehensive introduction to algebraic 
number theory. 

Finally, modern cryptography is intimately tied to number theory. Especially 
crucial in this connection is primality testing. We discuss various primality testing 
methods, including the recently developed AKS algorithm and then provide a basic 
introduction to cryptography. 

There are several ways that this book can be used for courses. Chapter | together 
with selections from the remaining chapters can be used for a one-semester course 
in number theory for undergraduates or beginning graduate students. The only 
prerequisites are a basic knowledge of mathematical proofs (induction, etc.) and 
some knowledge of Calculus. All the rest is self-contained, although we do use 
algebraic methods so that some knowledge of basic abstract algebra would be 
beneficial. A year-long course focusing on analytic methods can be done from 
Chapters 1, 2, 3, and 4 and selections from 5 and 6, while a year-long course 
focusing on algebraic number theory can be fashioned from Chapters 1, 2, 3, and 6 
and selections from 4 and 5. There are also possibilities for using the book for one 
semester introductory courses in analytic number theory, centering on Chapter 4, or 
for a one semester introductory course in algebraic number theory, centering on 
Chapter 6. Some suggested courses: 
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Basic Introductory One Semester Number Theory Course: Chapter 1, Chapter 2, 
Sections 3.1, 4.1, 4.2, 5.1, 5.3, 5.4, 6.1 

Year-Long Course Focusing on Analytic Number Theory: Chapter 1, Chapter 2, 
Chapter 3, Chapter 4, Sections 5.1, 5.3, 5.4, 6.1 

Year-Long Course Focusing on Algebraic Number Theory: Chapter 1, Chapter 2, 
Chapter 3, Chapter 6, Sections 4.1, 4.2, 5.1, 5.3, 5.4 

One-Semester Course Focusing on Analytic Number Theory: Chapter 1, Chapter 2 
(as needed), Sections 3.1, 3.2, 3.3, 3.4, 3.5, Chapter 4 

One-Semester Course Focusing on Algebraic Number Theory: Chapter 1, Chapter 2 
(as needed), Chapter 6 


We would like to thank the many people who have read through other prelim- 
inary versions of these notes and made suggestions. Included among these people 
are Kati Bencsath and Al Thaler, as well as the many students who have taken the 
courses. In particular, we would like to thank Peter Ackermann, who read through 
the whole manuscript both proofreading and making mathematical suggestions. 
Peter was also heavily involved in the seminars on the Prime Number 
Theorem from which much of the material in Chapter 4 comes. 


Benjamin Fine 
Gerhard Rosenberger 
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Chapter 1 
Introduction and Historical Remarks 


The theory of numbers is concerned with the properties of the integers, i.e., the class 
of whole numbers and zero, 0,+1,+2,.... The positive integers, 1,2,3... are 
called the natural numbers. The basic additive structure of the integers is relatively 
simple. Mathematically it is just an infinite cyclic group (see Chapter 2). Therefore 
the true interest lies in the multiplicative structure and the interplay between the 
additive and multiplicative structures. Given the simplicity of the additive structure, 
one of the enduring fascinations of the theory of numbers is that there are so many 
easily stated and easily understood problems and results whose proofs are either 
unknown or incredibly difficult. Perhaps the most famous of these was Fermat’s 
Big Theorem which was stated about 1650 and only recently proved by A.Wiles. 
This result said that the equation a” + b” = c” has no nontrivial (abc ¥ 0) integral 
solutions if n > 2. Wiles’ proof ultimately involved the very deep theory of elliptic 
curves. Another result in this category is the Goldbach conjecture first given about 
1740 and still open. This states that any even integer > 2 is the sum of two odd 
primes. We mention that since the first edition of this book appeared, the weak, or 
ternary Goldbach conjecture, has been proved by H.A. Helfgott [He]. This version 
states that any odd number greater than 7 is the sum of three odd primes. Another 
of the fascinations of number theory is that many results seem almost magical. The 
prime number theorem which describes the asymptotic distribution of the prime 
numbers has often been touted as the most surprising result in mathematics. 

The cornerstone of the multiplicative theory of the integers is the series of primes 
and the fundamental theorem of arithmetic which states that any integer can be 
decomposed, essentially uniquely, as a product of primes. One of the basic modes 
of proof in the theory of numbers is to reduce to the case of a prime and then use 
the fundamental theorem to patch back together for all integers. This concept of a 
fundamental prime decomposition, which has its origin in the fundamental theorem 
of arithmetic, permeates much of mathematics. In many different disciplines one of 
the major techniques is to find the indecomposable building blocks (the “primes” in 
that discipline) and then use these as starting points in proving general results. The 
© Springer International Publishing AG 2016 1 
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idea of a simple group and the Jordan—Holder decomposition in group theory is one 
example (see [Ro]). 

The purpose of this book is to give an introduction and overview of number theory 
based on the series of primes. It grew out of courses for advanced undergraduates in 
the United States and courses for teachers in Germany. There are many approaches to 
presenting this first material on number theory. We felt that this approach through the 
series of primes gave a solid background in standard material as well as presenting 
a wide overview of the whole discipline. 

Modern number theory has essentially three branches, which overlap in many 
areas. The first is elementary number theory, which can be quite nonelementary, 
and which consists of those results concerning the integers themselves which do not 
use analytic methods. This branch has many subbranches: the theory of congruences, 
diophantine analysis, geometric number theory, quadratic residues to mention a few. 
The second major branch is analytic number theory. This is the branch of the theory 
of numbers that studies the integers by using methods of real and complex analysis. 
The final major branch is algebraic number theory which extends the study of the 
integers to other algebraic number fields. By examining the series of primes we will 
touch on all these areas. 

In Chapter 2 we will consider the basic material in elementary number theory: the 
fundamental theorem of arithmetic, the theory of congruences, quadratic reciprocity 
and related results. One of the most important straightforward results is that there are 
an infinite collection of primes. In Chapter 3 we will look at a collection of proofs of 
this result. We will also look at Dirichlet’s Theorem which says that there are infinitely 
many primes in any arithmetic progression and at the twin prime conjecture. Although 
there are an infinite number of primes their density tends to thin out. It was observed 
though that if 7(x) denotes the number of primes less than or equal to x then this 
function behaves asymptotically as the function =~. This result is known as the prime 
number theorem. Besides being a startling result, the proof of the prime number 
theorem, done independently by Hadamard and De la Valle Poussin, became the 
genesis for analytic number theory. We will discuss the prime number theorem and its 
proof as well as the Riemann hypothesis in Chapter 4. For larger integers determining 
if anumber is a prime and determining its factorization becomes a nontrivial problem. 
The fact that factorization of large integers is so difficult has been used extensively in 
cryptography, especially public key cryptography, i.e., coding messages that cannot 
be hidden, such as privileged information sent over public access computer lines. 
In Chapter 5 we will discuss primality testing and hint at the uses in cryptography. 
The excellent book by Koblitz [Ko] is entirely devoted to the subject. Finally in 
Chapter 6 we discuss primes in algebraic number theory. We introduce the general 
idea of unique factorization and primes and prime ideals in number fields. 

The history of number theory has been very well documented. The book by 
L.E. Dickson The History of the Theory of Numbers [D] gives a comprehensive 
history until the early part of the twentieth century. The book by O. Ore Num- 
ber Theory and its History [O] gives a similar but not as comprehensive account 
and includes results up to the mid-twentieth century. Another excellent historical 
approach is the book by A.Weil Number Theory: An Approach Through History. 
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From Hammurapi to Legendre [W]. The Chapter Notes in Nathanson’s book Ele- 
mentary Methods in Number Theory [N] also provide good historical insights. In 
this book we will only touch on the history. For this introduction we give a very brief 
overview of some of the major developments. 

Number theory arises from arithmetic and computations with whole numbers. 
Every culture and society has some method of counting and number representation. 
However it was not until the development of a place value system that symbolic 
computation became truly feasible. The numeration system that we use is called the 
Hindu-Arabic numeration system and was developed in India most likely during the 
period 600-800 A.D. This system was adopted by Arab cultures and transported to 
Europe via Spain. The adoption of this system in Europe and elsewhere was a long 
process and it was not until the Renaissance and after that symbolic computation 
widely superseded the use of abaci and other computing devices. We should remark 
that although mathematics is theoretical it often happens that abstract results are 
delayed without proper computation. Calculus and analysis could not have developed 
without the prior development of the concept of an irrational number. 

Much of the beginnings of number theory came from straightforward observation 
and a great deal of number theoretic information was known to the Babylonians, 
Egyptians, Greeks, Hindus, and other ancient cultures. Greek mathematicians, espe- 
cially the Pythagoreans (around 450 B.C.), began to think of numbers as abstrac- 
tions and deal with purely theoretical questions. The foundation material of number 
theory—divisors, primes, gcd, lcm, the Euclidean algorithm, the fundamental theo- 
rem of arithmetic and the infinitude of primes—although not always stated in modern 
terms - are all present in Euclid’s Elements. Three of Euclid’s books, Book VII, Book 
VUI, and Book IX treat the theory of numbers. It is interesting that Euclid’s treat- 
ment of number theory is still geometric in its motivation and most of its methods. 
It wasn’t until the Alexandrian period, several hundred years later, that arithmetic 
was separated from geometry. The book Introductio Arithmeticae by Niomachus 
in the second century A.D. was the first major treatment of arithmetic and the proper- 
ties of the whole numbers without geometric recourse. This work was continued by 
Diophantus of Alexandria about 250 A.D. His great work Arithmetica is a collection 
of problems and solutions in number theory and algebra. In this work he introduced a 
great deal of algebraic symbolism as well as the topic of equations with indeterminate 
quantities. The attempt to find integral solutions to algebraic equations is now called 
Diophantine analysis in his honor. Fermats’ big theorem of solving x” + y" = z” 
for integers is an example of a Diophantine problem. 

The improvements in computational techniques led mathematicians in the 1500s 
and 1600s to look more deeply at number theoretical questions. The giant of this 
period was Pierre Fermat who made enormous contributions to the theory of numbers. 
It was Fermat’s work that could be considered the beginnings of number theory as a 
modern discipline. Fermat professionally was a lawyer and a judge and essentially 
only a mathematical amateur. He published almost nothing and his results and ideas 
are found in his own notes and journals as well as in correspondence with other 
mathematicians. Yet he had a profound effect on almost all branches of mathematics, 
not just number theory. He, as much as Descartes, developed analytic geometry. He 
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did major work, prior to Newton and Leibniz, on the foundations of calculus. A series 
of letters between Fermat and Pascal established the beginnings of probability theory. 
In number theory, the work he did on factorization, congruences, and representations 
of integers by quadratic forms determined the direction of number theory until the 
nineteenth century. He did not supply proofs for most of his results but almost all of 
his work was subsequently proved (or shown to be false). The most difficult proved 
to be his big theorem which remained unproved until 1996. The attempts to prove 
this big theorem led to many advances in number theory including the development 
of algebraic number theory. 

From the time of Fermat in the mid-seventeenth century through the eighteenth 
century a great deal of work was done in number theory but it was basically a 
series of somewhat disconnected, but often brilliant and startling, results. Important 
contributions were made by Euler, who proved and extended much of Fermat’s results 
including Fermat’s Two-Square Theorem (see Section 3.2). Euler also hinted at the 
law of quadratic reciprocity (see Section 2.6). This important result was eventually 
stated in its modern form by Legendre and the first complete proof was given by 
Gauss. During this period, certain problems were either stated or conjectured which 
became the basis for what is now known as additive number theory. The Goldbach 
conjecture and Waring’s problem are two examples. We will not touch much on this 
topic in this book but refer an interested reader to [N]. 

In 1800 Gauss published a treatise on number theory called Disquisitiones Arith- 
meticae. This book not only standardized the notation used but also set the tone and 
direction for the theory of numbers up until the present. It is often joked that any 
new mathematical result is somehow inherent in the work of Gauss and in the case 
of number theory this is not really that far-fetched. Tremendous ideas and hints of 
things to come are present in Gauss’ Disquisitiones. Gauss’ work on number the- 
ory centered on three main concepts: the theory of congruences (see Chapter 2), the 
introduction of algebraic numbers (see Chapter 5) and the theory of forms, espe- 
cially quadratic forms, and how these forms represent integers. Gauss, through his 
student Dirichlet, was also important in the infancy of analytic number theory. In 
1837 Dirichlet proved, using analytic methods, that there are infinitely many primes 
in any arithmetic progression {a + nb; n € N} with a, b relatively prime. We will 
discuss this result and its proof in Chapter 3. Euler and Legendre had both conjectured 
this theorem. Dirichlet’s use of analysis really marks the beginning of analytic num- 
ber theory. The main work in analytic number theory though, centered on the prime 
number theorem, was also conjectured by Gauss among others, including Euler and 
Legendre. This result deals with the asymptotic behavior of the function 


m(x) = number of primes < x. 


The actual result says that 
n(x) 


x00 x/Inx 
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and was proved in 1896 by Hadamard and independently by de la Valle Poussin. 
Both of their proofs used the behavior of the Riemann zeta function 


cao 
(= Dae 


where z = x + iy is a complex variable. Using this function, Riemann in 1859 
attempted to prove the prime number theorem. In the attempted proof he hypoth- 
esized that all the zeros z = x + iy of ¢(z) in the strip 0 < x < 1 lie along the line 
x= ‘. This conjecture is known as the Riemann hypothesis and is still an open 
question. 

Algebraic number theory also started basically with the work of Gauss. Gauss 
did an extensive study of the complex integers, that is the complex numbers of the 
form a + bi with a, b integers. Today these are known as the Gaussian integers. 
Gauss proved that they satisfy most of the same properties as the ordinary integers 
including unique factorization into primes. In modern parlance he showed that they 
form a unique factorization domain. Gauss’s algebraic integers were extended in 
many ways in attempt to prove Fermat’s big theorem, and these extensions eventually 
developed into algebraic number theory. Kummer, a student of Gauss and Dirichlet, 
introduced in the 1840s a theory of algebraic integers and a set of ideal numbers from 
which unique factorization could be obtained. He used this to prove many cases of 
the Fermat theorem. Dedekind, in the 1870s, developed a further theory of algebraic 
numbers and unique factorization by ideals which extended both Gaussian integers 
and Kummer’s algebraic and ideal numbers. Further work in the same area was done 
by Kronecker in the 1880s. We will discuss algebraic number theory and prime ideals 
in Chapter 6. 

Modern number theory extends and uses all these classical ideas, although there 
have been many major new innovations. The close ties between number theory, 
especially diophantine analysis, and algebraic geometry led to Wiles’ proof of the 
Fermat Theorem and to an earlier proof by Faltings of the Mordell conjecture, which 
is a related result. The vast area of mathematics used in both of these proofs is 
phenomenal. Probabilistic methods were incorporated into number theory by P. Erdos 
and studies in this area are known as probabilistic number theory. A great deal of 
recent work has gone into primality testing and factorization of large integers. These 
ideas have been incorporated extensively into cryptography (see [Ko]). 


Chapter 2 
Basic Number Theory 


2.1 The Ring of Integers 


The theory of numbers is concerned with the properties of the integers, that is, the 
class of whole numbers and zero, 0, +1, +2, .... We will denote the class of integers 
by Z. The positive integers, 1, 2,3,... are called the natural numbers, which we 
will denote by N. We will assume that the reader is familiar with the basic arithmetic 
properties of Z and in this section we will look at the abstract algebraic properties 
of the integers and what makes Z unique as an algebraic structure. 

Recall that a ring R is a set with two binary operations, addition, denoted by +, 
and multiplication denoted by - or just by juxtaposition, defined on it satisfying the 
following six axioms: 


1. Addition is commutative: a + b = b+ a for each pair a, b in R. 

2. Addition is associative: a+ (b+ c) = (a+b)+c fora,b,ceé R. 

3. There exists an additive identity, denoted by 0, such that a + 0 = a for each 
aeR. 

4. Foreacha € R there exists an additive inverse denoted —a, such that a+ (—a) = 
0. 

5. Multiplication is associative: a(bc) = (ab)c fora, b,c € R. 

6. Multiplication is distributive over addition: a(b+ c) = ab + ac and (b+ c)a = 
ba+ca fora,b,ceé R. 


If in addition R satisfies 
7. Multiplication is commutative: ab = ba for each pair a,b in R 
then R is a commutative ring, while if R satisfies 


8. There exists a multiplicative identity denoted by | (not equal to 0) such that 
a-1=1-a=aforeachainR 


then R is a ring with an identity. A commutative ring with identity satisfies | 
through 8. 
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A field K is a commutative ring with an identity in which every nonzero element 
has a multiplicative inverse, that is, for each a € K with a # 0 there exists an 
element b € K such that ab = ba = 1. In this case the set K* = K\{0} forms an 
abelian group with respect to the multiplication in K. K* is called the multiplicative 
group of K. 

A ring can be considered as the most basic algebraic structure in which addition, 
subtraction, and multiplication can be done. In any ring the equation x + b = c 
can always be solved. Further a field can be considered as the most basic algebraic 
structure in which addition, subtraction, multiplication, and division can be done. 
Hence in any field, the equation ax + b = c witha 4 0 can always be solved. 

Combining this definition with our knowledge of Z we get that 


Lemma 2.1.1 The integers Z form a commutative ring with identity. 


There are many examples of such rings (see Exercises), so to define Z uniquely 
we must introduce certain other properties. If two nonzero integers are multiplied 
together then the result is nonzero. This is not always true in a ring. For example, 
consider the set of functions defined on the interval [0, 1]. Under ordinary multipli- 
cation and addition, these form a ring (see Exercises) with the zero element being 
the function which is identically zero. Now let f(x) be zero on [0, 5] and nonzero 
elsewhere and let g(x) be zero on [5 , 0] and nonzero elsewhere. Then f (x)-g(x) = 0 
but neither is the zero function. We define an integral domain to be a commutative 
ring R with an identity and with the property that if ab = 0 with a,b € R then 
either a = 0 or b = 0. Two nonzero elements which multiply together to get zero 
are called zero divisors and hence an integral domain is a commutative ring with an 
identity and no zero divisors. Therefore, Z is an integral domain. 

The integers are also ordered, that is, we can compare any two integers. We abstract 
this idea in the following manner. We say that an integral domain D is an ordered 
integral domain if there exists a distinguished set D*, called the set of positive 
elements, with the properties that 


(1) The set D* is closed under addition and multiplication. 
(2) Ifx € D then exactly one of the following is true 


(a) x =0 
(b) x € D* 
(c) —x € D*. 


In any ordered integral domain D we can order the elements in the standard way. 
Ifx, y € Dthen x < y means that (y—x) € Dt. With this ordering Dt can clearly 
be identified with those x € D such that x > 0. We then get 


Lemma 2.1.2 /f D is an ordered integral domain then 


(1)x < yandy < zimplyx < z. 
(2) Ifx, y € D then exactly one of the following holds: 


xX =yorx<yory<x. 
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We thus have that the integers are an ordered integral domain. Their uniqueness 
as such a structure depends on two additional properties of Z which are equivalent. 


The Inductive Property Let S be a subset of the natural numbers N. Suppose 
1 € Sand S has the property that ifn € S then (n+ 1) € S. Then S =N. 


The Well-Ordering Property Let S be a nonempty subset of the natural numbers 
N. Then S has a least element. 


Lemma 2.1.3 The inductive property is equivalent to the well-ordering property. 


Proof To prove this we must assume first the inductive property and show that the 
well-ordering property holds and then vice versa. Suppose the inductive property 
holds and let S be a nonempty subset of N. We must show that S' has a least element. 
Let T be the set 

T ={x EN; x <s,Vs € S}. 


Now | € T since S C N. If whenever x € T it would follow that (x + 1) € T then 
by the inductive property T = N but then S would be empty contradicting that S is 
nonempty. Therefore, there exists an a witha € T and (a+ 1) ¢ T. We claim that 
a is the least element of S$. Nowa < s foralls € S sincea € T. Ifa ¢ S then 
every s € S would also satisfy (a + 1) < s. This would imply that (a + 1) € T 
a contradiction. Therefore, a € S anda < s for all s € S and hence a is the least 
element. Therefore, the inductive property implies the well-ordering property. 
Conversely, suppose that the well-ordering property holds and suppose 1 € S and 
whenever n € S it follows that (n + 1) € S. We must show that S = N. If S A N 
then N\S is anonempty subset of N. Therefore, it must have a least element n. Hence 
(n— 1) € S. But then (n — 1)+ 1 =n € S, also which is a contradiction. Therefore, 
N\S is empty and S =N. 


The inductive property is of course the basis for inductive proofs which play a 
big role in the theory of numbers. To remind the reader, in an inductive proof we 
want to prove statements P(n) which depend on positive integers. In the induction 
we show that (1) is true, then show that the truth of P(n + 1) depends upon the 
truth of P(n). From the inductive property (7) is then true for all positive integers 
n. We give an example which has an ancient history in number theory. 


Example 2.1.1 Show that 1+2+---+n= @9+¥ 


()@) 
2 


Here for n = 1 we have | = = |. So its true form = 1. Assume that the 


statement is true for n = k, that is 


k(k+1 
14 24sthe EEO 


and considern =k + 1. 


k(k +1 k+Dk+2 
142s thE HD = 424-404) = “FO p41 = SEOEFS 
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1+2 1+2+3 1+2+3+4 


Fig. 2.1 Triangular Numbers 


Hence the statement is true form = k + 1 and hence true by induction for alln € N. 
The series of integers 


1,14+2=3,14+24+3=6,14+2+3+4+4=10,... 


are called the triangular numbers since they are the sums of dots placed in triangular 
form as in Figure 2.1. These numbers were studied by the Pythagoreans in Greece in 
500 B.C. 

The inductive property is enough to characterize the integers among ordered 
integral domains up to isomorphism. Recall that if R and S are rings, a function 
f : R > Sis a homomorphism if it satisfies: 


Ll. fi t+re) = fri) + fz) for ri, r2 € R. 
2. fire) = ff (2) for ri, ro € R. 


If f is also a bijection, then f is an isomorphism, and R and S are isomorphic. 
Isomorphic algebraic structures are essentially algebraically the same. We have the 
following theorem. 


Theorem 2.1.1 Let R be an ordered integral domain which satisfies the inductive 
property (replacing N by the set of positive elements in R). Then R is isomorphic 
to Z. 


We outline a proof in the exercises. 


2.2 Divisibility, Primes, and Composites 


The starting point for the theory of numbers is divisibility. 


Definition 2.2.1 Ifa, b are integers we say that a divides b, or that a is a factor or 
divisor of b, if there exists an integer q such that b = aq. We denote this by a|b. b 
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—_ 


is then a multiple of a. If b > 1 is an integer whose only factors are b then b 


is a prime, otherwise b > 1| is composite. 


The following properties of divisibility are straightforward consequences of the 
definition: 


Theorem 2.2.1 (1) a|b = > albc for any integer c. 
(2) a|b and b\c imply a|c. 
(3) a\b and a|c imply that a\(bx + cy) for any integers x, y. 
(4) a|b and b\a imply that a = +b. 
(5) Ifa|b anda > 0,b > 0 thena < b. 
(6) alb if and only if ca|cb for any integer c # 0. 
(7) a\O for alla € Zand Oja only for a = 0. 
(8) a| + 1 only fora = +1. 
(9) ay|by and ay|b2 imply that a,az|b bo. 


Proof We prove (2) and leave the remaining parts to the exercises. 
Suppose a|b and b|c. Then there exist x, y such that b = ax andc = by. But then 
c = axy = a(xy) and therefore a|c. 


If b, c, x, y are integers then an integer bx + cy is called a linear combination of 
b, c. Thus part (3) of Theorem 2.2.1 says that if a is a common divisor of b, c then 
a divides any linear combination of b and c. 

Further, note that if b > 1 is a composite then there exists x > 0 and y > 0 such 
that b = xy and from part (5) we must have | <x <b,l<y<b. 

In ordinary arithmetic, given a, b we can always attempt to divide a into b. The 
next theorem, called the division algorithm, says that if a > 0 either a will divide 
b or the remainder of the division of b by a will be less than a. 


Theorem 2.2.2 (Division Algorithm) Given integers a,b with a > 0 then there 
exist unique integers q andr such that b = qa+r where eitherr =O or0 <r <a. 


One may think of g and r as the quotient and remainder, respectively, when 
dividing b by a. 


Proof Given a, b with a > 0 consider the set 
S = {b-—qa>0;q € Z}. 


If b > 0 then b+ a > O and the sum is in S. If b < 0 then there exists ag > 0 with 
—qa < b. Then b+ qa > 0 and is in S. Therefore, in either case S is nonempty. 
Hence S is anonempty subset of N U {0} and therefore has a least element r. If r 4 0 
we must show that 0 < r < a. Suppose r > a, thenr = a+ x with x > 0 and 
x <rsincea > 0.Thenb—qa=r=a+x = b—(q+1)a=x. This means 
that x € S. Since x < r this contradicts the minimality of r which is a contradiction. 
Therefore, if r 4 0 it follows thatO <r <a. 
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The only thing left is to show the uniqueness of g and r. Suppose b = qua +r 
also. By the construction above r; must also be the minimal element of S. Hence 


r) <randr <r,;sor=r,.Now 


b—qa=b-—qa => (m-qa=0 


but since a > 0 it follows that gq, — gq =0sothatq=q. 


The next ideas that are necessary are the concepts of greatest common divisor 
and least common multiple. 


Definition 2.2.2. Given nonzero integers a,b their greatest common divisor or 
GCD d > 0 is a positive integer which is a common divisor, that is, d|a and d|b, and 
if d, is any other common divisor then d,|d. We denote the greatest common divisor 
of a, b by either gcd(a, b) or (a, b). 


The next result says that given any nonzero integers they do have a greatest 
common divisor and it is unique. 


Theorem 2.2.3 Given nonzero integers a, b their GCD exists, is unique, and can be 
characterized as the least positive linear combination of a and b. 


Proof Given nonzero a, b consider the set 
S=f{ax+by>0;x,yeEZ} 


Now a? + b* > 0 so S is a nonempty subset of N and hence has a least element 
d > 0. We show that d is the GCD. 

First, we must show that d is acommon divisor. Now d = ax + by and is the least 
such positive linear combination. By the division algorithm a = gd +r withO < 
r <d.Supposer # 0. Thenr = a—qd =a—q(ax+by) = (1—qx)a—gqby > 0. 
Hence r is a positive linear combination of a and b and therefore is in S. But then 
r < d contradicting the minimality of d in S. It follows that r = 0 and soa = qd 
and dla. An identical argument shows that d|b and so d is a common divisor of a 
and b. Let d; be any other common divisor of a and b. Then d, divides any linear 
combination of a and b and so d|d. Therefore, d is the GCD of a and b. 

Finally, we must show that d is unique. Suppose d; is another GCD of a and 
b. Then d; > O and d; is a common divisor of a, b. Then d,|d since d is a GCD. 
Identically d|d, since d, is a GCD. Therefore, d = +d, and then d = d, since they 
are both positive. 


We note that as a consequence of Theorem 2.2.3 that if a, b, k are nonzero integers 
then the equation ax +by = k has integer solutions x, y if and only if (a, b) divides k. 

If (a, b) = 1 then we say that a, b are relatively prime or coprime. It follows 
that a and b are relatively prime if and only if 1 is expressible as a linear combination 
of a and b. We need the following three results: 
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Lemma 2.2.1 If d = (a, b) then a = a\d and b = byd with (a, bj) = 1. 


Proof If d = (a, b) then dja and d|b. Hence a = ayd and b = b,d. We have 
d=ax+by =a,\dx + bidy. 
Dividing both sides of the equation by d we obtain 


l=ax+by. 


Therefore, (a), b;) = 1. 


Lemma 2.2.2 For any integer c we have that (a, b) = (a,b + ac). 


Proof Suppose (a, b) = d and (a, b + ac) = d,. Now d is the least positive linear 
combination of a and b. Suppose d = ax +by. d, is a linear combination of a, b++ac 
so that 

d, =ar+(b+ac)s =a(cs +r) +bs. 


Hence d, is also a linear combination of a and b and therefore d, > d. On the other 
hand, d,|a and d,|(b + ac) and so d|b. Therefore, d||d so d; < d. Combining these 
we must have d; = d. 


From this we easily see that (a, b) = a if.a, b are nonzero integers with a|b. 
The next result, called the Euclidean algorithm, provides a technique for both 
finding the GCD of two integers and expressing the GCD as a linear combinations. 


Theorem 2.2.4 (The Euclidean Algorithm) Given integers b anda > 0 witha { b 
form the repeated divisions 


b=qa+n,0<r, <a 


a= Qri+2,0<m <r 


Tn-2 = nl n-1 + rn, 0 <Vn <Tn-1 
Tnh-1 = Qn+iln- 


The last nonzero remainder r, is the GCD of a, b. Further r, can be expressed as a 
linear combination of a and b by successively eliminating the r;’s in the intermediate 
equations. 


Proof In taking the successive divisions as outlined in the statement of the theorem 
each remainder r; gets strictly smaller and still nonnegative. Hence it must finally 
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end with a zero remainder. Therefore, there is a last nonzero remainder r,,. We must 
show that this is the GCD. 
Now from Lemma 2.2.2, the GCD satisfies 


(a, b) = (a,b — qia) = (a,r1) = (1, 4 — qari) = (1,12). 
Continuing in this manner we have then that (a, b) = (fn-1, ’n) = rn Since r, divides 


rn— 1. This shows that r, is the GCD. 
To express r, as a linear combination of a and b notice first that 


ly = ln-2 — Qnln-1- 
Substituting this in the immediately preceding division we get 
Yn = 'n-2 — Gn (Tn—3 — Qn—1' n—2) — d + QnQn—1)Tn—2 — nl n-3- 


Doing this successively, we ultimately express 7, as a linear combination of a and 
b. 


EXAMPLE 2.2.1 Find the GCD of 270 and 2412 and express it as a linear 
combination of 270 and 2412. 
We apply the Euclidean algorithm 
2412 = (8)(270) + 252 
270 = (1)(252) + 18 
252 = (14)(18) 
Therefore, the last nonzero remainder is 18 which is the GCD. We now must express 
18 as a linear combination of 270 and 2412. 
From the first equation 
252 = 2412 — (8)(270) 
which gives in the second equation 


270 = (2412 — (8)(270)) +18 => 18 = (—1)(2412) + (9)(270) 


which is the desired linear combination. 


Now suppose that d = (a,b) where a,b € Zanda £ 0,b & 0. Then we note 
that given one integer solution of the equation 


ax +by=d 


we can easily obtain all solutions. 
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Suppose without loss of generality that d = 1, that is, a, b are relatively prime. If 
not we can divide through by d > 1. Suppose that x;, y; and x2, y2 are two integer 
solutions of the equation ax + by = 1, that is, 


ax; + by; =1 
ax2 + by, = 1. 


Then 
a(x; — X2) = —b(y1 — ya). 


Since (a, b) = | we get from Lemma 2.2.3 that b|(x; — x2) and hence x2 = x; + bt 
for some ¢ € Z. Substituting back into the equations, we then get 


ax, + by; = a(x, + bt) + by2 => by, = abt + by. 
Therefore, y2 = y,; — at. Hence all solutions are given by 
Xo =x, +bt 
y2 = yi — at 
for some t € Z. 


The final idea of this section is that of a least common multiple. 


Definition 2.2.3. Given nonzero integers a, b their least common multiple or LCM 
m > 0 is an positive integer which is a common multiple, that is, ajm and b\|m, and 
ifm, is any other common multiple then m|m,. We denote the least common multiple 
of a, b by either lcm(a, b) or [a, b]. 


As for GCD’s given any nonzero integers they do have a least common multiple 
and it is unique. First, we need the following result known as Euclid’s Lemma. In 
the next section, we will use a special case of this applied to primes. We note that 
this special case is traditionally also called Euclid’s lemma. 


Lemma 2.2.3 (Euclid’s Lemma) Suppose a|bc and (a, b) = 1, thena|c. 


Proof Suppose (a, b) = 1 then | is expressible as a linear combination of a and b. 
That is, 
ax+by = 1. 


Multiply by c, so that 
acx + bey =c. 


Now ala and albc so a divides the linear combination acx + bcy and 
hence alc. 
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Theorem 2.2.5 Given nonzero integers a, b their LCM exists and is unique. Further 
we have 
(a, b)[a, b] = ab. 


Proof Let d = (a,b) and let m = eae We show that m is the LCM. Now a = 
a\d,b = bid with (a1, D1) = 1. Thenm = ab,d. Since ad = aid, m= bia so 
a|m. Identically, b|m so m is acommon multiple. Now let m, be another common 
multiple so that m; = ax = by. We then get 


ajdx = bidy = aax=biy => ajlbiy. 
But (a), b;) = 1 so from Lemma2.2.3 a;|y. Hence y = az. It follows then that 
m, = bid(a,z) = aib\dz = mz 
and hence m|m,. Therefore, m is an LCM. 
The uniqueness follows in the same manner as the uniqueness of GCD’s. Suppose 


m, is another LCM, then m|m, and m,|m so m = +m, and since they are both 
positive m = m,. 


EXAMPLE 2.2.2 Find the LCM of 270 and 2412. 
From Example 2.2.1, we found that (270, 2412) = 18. Therefore, 


__ (270)(2412) __ (270)(2412) 


270, 2412] = = = 36180. 
(270, 2412) 18 


2.3. The Fundamental Theorem of Arithmetic 


In this section, we prove the fundamental theorem of arithmetic which is really the 
most basic number theoretic result. This result says that any integer n > | can be 
decomposed into prime factors in essentially a unique manner. First, we show that 
there always exists such a decomposition into prime factors. 


Lemma 2.3.1 Any integer n > 1 can be expressed as a product of primes, perhaps 
with only one factor. 


Proof The proof is by induction. n = 2 is prime so its true at the lowest level. 
Suppose that every integer 2 < k < n can be decomposed into prime factors, we 
must show that n then also has a prime factorization. 

Ifn is prime then we are done. Suppose then that n is composite. Hence n = m m2 
with | < m, <n, 1 < my, <n. By the inductive hypothesis both m, and m2 can be 
expressed as products of primes. Therefore, 7 can also use the primes from m , and 
mz, completing the proof. 
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Before we continue to the fundamental theorem, we mention that this result can be 
used to prove that the set of primes is infinite. The proof we give goes back to Euclid 
and is quite straightforward. In the next chapter, we will present a whole collection 
of proofs, some quite complicated also show that the primes are an infinite set. Each 
of these other proofs will shed more light however on the nature of the integers. 


Theorem 2.3.1 There are infinitely many primes. 


Proof Suppose that there are only finitely many primes pj), ..., D,. Each of these is 
positive so we can form the positive integer 


N = pip2--: Pratl. 


From Lemma2.3.1, N has a prime decomposition. In particular, there is a prime p 
which divides NV. Then 


P\(P1p2°** Pn +1). 


Since the only primes are assumed p1, p2,..., Pn it follows that p = p; for some 
i=1,...,n. But then p|p,p2--- pj +++ Py SO p cannot divide p, --- py + 1 which 
is a contradiction. Therefore, p is not one of the given primes showing that the list 
of primes must be endless. 


A variation of Euclid’s argument gives the following proof of Theorem 2.3.1. 
Suppose there are only finitely many primes p),..., Py». Certainly n > 2. Let P = 
{P1,---, Pn}. Divide P into two disjoint nonempty subsets P;, P2. Now consider the 
number m = q; + q2 where gq; is a product of primes from P, and q2 is a product of 
primes from P. Let p be a prime divisor of m. Since p € P it follows that p divides 
either qg, or gz but not both. But then p does not divide m a contradiction. Therefore, 
p is not one of the given primes and the number of primes must be infinite. 

Although there are infinitely many primes, a glance at the list of primes, shows 
that they appear to become scarcer as the integers get larger. If we let 


w(x) = number of primes < x 


a basic question is what is the asymptotic behavior of this function. This question 
is the basis of the prime number theorem which will be discussed in Chapter 4. 
However, it is easy to show that there are arbitrarily large spaces or gaps within the 
set of primes. 


Theorem 2.3.2 Given any positive integer k there exists k consecutive composite 
integers. 


Proof Consider the sequence 
(K+ 1!4+2, (K+ 0D!4+3,...,44+D!4+k4+1. 


Suppose 7 is an integer with 2 <n <k+ 1. Thenn|((k + 1)!+ 7). Hence each of 
the integers in the above sequence is composite. 
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To show the uniqueness of the prime decomposition we need Euclid’s Lemma, 
from the previous section, applied to primes. 


Lemma 2.3.2. (Euclid’s Lemma) /f p is a prime and p|ab then pla or p\b. 


Proof Suppose p|ab. If p does not divide a then clearly a and p must be relatively 
prime, that is, (a, p) = 1. Then from Lemma 2.2.3, p|b. 


We now state and prove the fundamental theorem of arithmetic. 


Theorem 2.3.3 (The Fundamental Theorem of Arithmetic) Given any integer 
n # 0 there is a factorization 


n = CPi p2--* Pk 


where c = +1 and p,,..., Pn are primes. Further this factorization is unique up to 
the ordering of the factors. 


Proof We assume that n > 1. If < —1 we use c = —1 and the proof is the same. 
We define the product of no primes, that is, when k = 0, to be 1. Then the statement 
certainly holds for n = 1 with k = 0. Now suppose n > |. From Lemma2.3.1, n 
has a prime decomposition 


n= Pi P2°** Pm- 


We must show that this is unique up to the ordering of the factors. Suppose then that 
n has another such factorization n = qq --- qx With the q; all prime. We must show 
that m = k and that the primes are the same. Now we have 


n= Pi P2r'** Pm = 41°" * Ak 


Assume that k > m. Then it follows that p;|qig2--- qx. From Lemma2.3.2, then 
we must have that p;|g; for some i. But g; is prime and p; > | so it follows that 
Pi = qi. Therefore, we can eliminate p,; and q; from both sides of the factorization 
to obtain 


P2°** Pm = Q1°°* Gi-19i+1 °° * Uk- 


Continuing in this manner, we can eliminate all the p; from the left side of the 
factorization to obtain 
l=qi,-:-qi,, wiht=k—m 


Ifqi,, ..., gi, were primes this would be impossible. Therefore, m = k and each prime 
p; was included in the primes q, ..., Gm and vice versa. Therefore, the factorizations 
differ only in the order of the factors, proving the theorem. 


For any positive integer n > 1 we can combine all the same primes to write 


n= pips? +> pe* with pi < Pr <+++ < px. 
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This is called the standard prime decomposition. Note that given any two positive 
integers a, b we can always write the prime decomposition with the same primes by 
allowing a zero exponent. 

There are several easy consequences of the fundamental theorem. 


Theorem 2.3.4 Let a, b be positive integers > 1. Suppose 


=p 1pe 
b = pf... pit 


where we include zero exponents for noncommon primes. Then 


min(ey, fi) min(e2, f2) min(ex, fr) 
(a,b) = p; mis D5 PN DE ko JSk 


max(e,,f\) max (ep, f2) max (ex, fx) 
. ) a Pr 


la, b] = p, 
Corollary 2.3.1 Let a, b be positive integers > 1, then (a, b)[a, b] = ab. 


We leave the proofs to the exercises but give an example. 


EXAMPLE 2.3.1 Find the standard prime decompositions of 270 and 2412 and 
use them to find the GCD and LCM. 

Recall that we found the GCD and LCM of these numbers in the previous section 
using the Euclidean algorithm. We note that in general it is very difficult as the size 
gets larger to determine the actual prime decomposition or even whether it is a prime 
or not. We will discuss primality testing in Chapter 5. 

To find the prime decomposition we factor and then continue refactoring until 
there are only prime factors. 


270 = (27)(10) = 3° -2-5=2-3°-5 
which is the standard prime decomposition of 270. 
2412 = 4-603 =4-3-201=4-3-3-67=2°-3?-67 
which is the standard prime decomposition of 2412. Hence we have 
M0 S269 5.67" 
2412 = 27.37. 5°. 67 


=> (a,b) =2-37-5°.679 =2-37 = 18 
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and 


[a, b] = 27 - 3° - 5-67 = 36180. 


Note that the fundamental theorem of arithmetic can be extended to the rational 
numbers. Suppose r = f witha > 0, b 0 is a positive rational. Then 


p+ pe ‘ : 

k 4 te 

al ji Se = Pi No. pit ae 
Py *** Px 


Therefore, any positive rational has a standard prime decomposition 


piv pi where f), ..., % are integers. 
So, for example, 
ggg 
49 , 


This has the following interesting consequence. 


Lemma 2.3.3 Jf a is an integer which is not a perfect nth power then the nth root 
of a is irrational. 


Proof This result says, for example, that if an integer is not a perfect square then its 
square root is irrational. The fact that the square root of 2 is irrational was known to 
the Greeks. 

Suppose b is an integer with standard prime decomposition 


_ Pi oF 
Then 
b” = P po 


and this must be the standard prime decomposition for b”. It follows that an integer 
a is an nth power if and only if it has a standard prime decomposition 


a= ae angi with n| f; for every i. 


Suppose a is not an nth power then 


a=qi qi 


where n does not divide f; for some 7. Taking the nth root 


fi/n | 


gin filn fin 


“ij “+i 
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But f;/n is not an integer so a!/” cannot be rational by the extension of fundamental 


theorem to rationals. O 


While induction and least well-ordering characterize the integers, unique factor- 
ization into primes does not. We close this section with a brief further discussion of 
unique factorization. 

The concept of divisor and factor can be extended to any ring. a|b is a ring R 
if there is ac € R with b = ac. We will restrict ourselves to integral domains. A 
unit in an integral domain is an element e with a multiplicative inverse. This means 
that there is an element e; in R with ee; = 1. Thus the only units in Z are +1. Two 
elements r,r, of an integral domain are associates if r = er, for some unit e. A 
prime in a general integral domain is an element whose only divisors are associates 
of itself or units. With these definitions, we can talk about factorization into primes. 

We say that an integral domain D is a unique factorization domain or UFD if 
for each d € D then either d = 0, d is a unit or d has a factorization into primes 
which is unique up to ordering and unit factors. This means that if 


r=Pi-+: Pm =1°°* Wk 


then m = k and each p; is an associate of some qj. 

The fundamental theorem of arithmetic in more general algebraic language says 
that the integers Z are a unique factorization domain. However, they are far from 
being the only one. In the exercises, we outline a proof of the following. 


Theorem 2.3.5 Let F be a field and F[x] the ring of polynomials in one variable 
over F. Then F[x] is a UFD. 


This theorem is actually a special case of something even more general. An integral 
domain D is called a Euclidean domain if there exists a function N : D\{0} > 
N U {0} satisfying: 


For each a, b € D,a € 0 there exists g, r € D such that 
b=aq +r and either r = 0 orr 4 Oand N(r) < N(a). 


Theorem 2.3.6 Any Euclidean domain is a UFD. 


The proof of this essentially mimics the proof for the integers. See the exercises. 


The Gaussian integers Z[i] are the complex numbers a + bi where a, b are 
integers. 


Lemma 2.3.4 The integer Z, the Gaussian integers Zi], and the ring of polynomials 
F [x] over a field F are all Euclidean domains. 


Corollary 2.3.2 Z[i] and F[x] with F, a field, are UFDs. 


Proofs of these results will be given in Chapter 6. 
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2.4 Congruences and Modular Arithmetic 


Gauss based much of his number theoretical investigations around the theory of 
congruences. As we will see a congruence is just a statement about divisibility put 
into a more formal framework. In this section and the remainder of the chapter, we 
will consider congruences and in particular the solution of polynomial congruences. 
First, we give the basic definitions and properties. 


2.4.1 Basic Theory of Congruences 


Definition 2.4.1 Suppose m is a positive integer. If x, y are integers such that m|(x — 
y) we say that x is congruent to y modulo m and denote this by x = y mod m. If 
m does not divide x — y then x and y are incongruent modulo m. 


If x = y mod m then y is called a residue of x modulo m. Given x € Z the set 
of integers {y € Z; x = y mod m} is called the residue class for x modulo m. We 
denote this by [x]. Notice that x = 0 mod m is equivalent to m|x. We first show that 
the residue classes partition Z, that is, each integer falls in one and only one residue 
class. 


Theorem 2.4.1 Given im > 0 then congruence modulo m is an equivalence relation 
on the integers. Therefore, the residue classes partition the integers. 


Proof Recall that a relation ~ on a set S is an equivalence relation if it is reflexive, 
thatis, s ~ s forall s € S;symmetric, that is, ifs; ~ s. then. s, ~ s,; and transitive, 
that is, ifs} ~ s2 and sy ~ s3 then s; ~ 53. If ~ is an equivalence relation then the 
equivalence classes [s] = {s; € S;.s, ~ s} partition S. 

Consider = mod m on Z. Given x € Z, x —x =0 =0-msom|(x — x) and 
x =x mod m. Therefore, = mod m is reflexive. 

Suppose x = y mod m then m|(x — y) => x — y =am forsomea € Z. Then 
y—x = —am som|(y — x) and y = x mod m. Therefore, = mod m is symmetric. 

Finally suppose x = y mod m and y = z mod m. Then x — y = aym and 
y—Z= am. But then x —z = (x —y) + (y—-Z) =aim+aqam = (aq, + a)m. 
Therefore, m|(x — z) and x = z mod m. Therefore, = mod m is transitive and the 
theorem is proved. 


Hence given m > 0 every integer falls into one and only one residue class. We 
now show that there are exactly m residue classes modulo m. 


Theorem 2.4.2 Given m > 0 there exist exactly m residue classes. In particular, 
[0], [1],...,[m — 1] 


gives a complete set of residue classes. 
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Proof We show that given x € Z, x must be congruent modulo m to one of 
0, 1,2,...,m— 1. Further none of these are congruent modulo m. As a consequence 


[0], [1],...,[m— 1] 


give a complete set of residue classes modulo m and hence there are m of them. 
To see these assertions suppose x € Z. By the division algorithm, we have 


x=qm-+rwhereO<r<m 


This implies that r = x — qm or in terms of congruences that x = r mod m. 
Therefore, x is congruent to one of the sets {0, 1, 2,...,m — 1}. 

Suppose 0 < r; <r <_m.Thenm { ro —r; sor; and rz are incongruent modulo 
m. Therefore, every integer is congruent to one and only one of 0, 1, ..., m — 1, and 
hence [0], [1], ..., [7 — 1] give a complete set of residue classes modulo m. 


There are many sets of complete residue classes modulo m. In particular, a set 
of m integers x), X2,..., Xm Will comprise a complete residue system modulo m if 
x; # x; mod m unless i = j. Given one complete residue system, it is easy to get 
another. 


Lemma 2.4.1 Jf {x,,...,Xm} form a complete residue system modulo m and 
(a,m) = | then {ax,..., AX} also comprise a complete residue system. 


Proof Suppose ax; = ax; mod m. Then mla(x; — x;). Since (a,m) = 1 then by 
Euclid’s lemma m|(x; — x;) and hence x; = x; mod m. 


Finally, we will need the following: 
Lemma 2.4.2 If x = y mod m then (x,m) = (y,m). 


Proof Suppose x — y = am then any common divisor of x and m is also a common 
divisor of y. From this the result is immediate. 


2.4.2. The Ring of Integers Mod N 


Perhaps the easiest way to handle results on congruences is to place them in the 
framework of abstract algebra. To do this we construct, for each n > 0 aring, called 
the ring of integers modulo n. We will follow this approach. However we note, that 
although this approach simplifies and clarifies many of the proofs, historically purely 
number theoretical proofs were given. Often these purely number theoretical proofs 
inspired the algebraic proofs. 
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To construct this ring, we first need the following: 


Lemma 2.4.3 [fa = b modn and c =d mod n then 


1. a+c=b+dmodn 
2. ac=bdmodn 


Proof Suppose a = b modn and c = d modn thena — b = qin andc — d = qon 

for some integers g), 2. This implies that (a + c) — (b+ d) = (q; + q2)n or that 

n|((a+c) — (b+ d)). Therefore, a +c=b+dmodn. 
We leave the proof of (2) to the exercises. 


We now define operations on the set of residue classes. 


Definition 2.4.2 Consider a complete residue system x, ..., Xj, modulo n. On the 
set of residue classes [x1], ..., [Xn] define 


1. [xi] + [xj] = be + %;] 
2. [x] [xj] = bx] 


Theorem 2.4.3 Given a positive integer n > 0, the set of residue classes forms a 
commutative ring with an identity under the operations defined in Definition 2.4.2. 
This is called the ring of integers modulo n and is denoted by Z,. The zero element 
is [0] and the identity element is [1]. 


Proof Notice that from Lemma 2.4.3, it follows that these operations are well-defined 
on the set of residue classes, that is, if we take two different representatives for a 
residue class, the operations are still the same. 

To show Z,, is a commutative ring with an identity we must show that it satisfies, 
relative to the defined operations, all the ring properties. Basically, Z, inherits these 
properties from Z. We show commutativity of addition and leave the other properties 
to the exercises. 

Suppose [a], [b] € Z,. Then 


[a] + [b] = [a+ 6] = [b+ a] = [4] + [a] 


where [a + b] = [b + a] since addition is commutative in Z. 


This theorem is actually a special case of a general result in abstract algebra. In 
the ring of integers Z the set of multiples of an integer forms an ideal (see [A] for 
terminology) which is usually denoted nZ. The ring Z, is the quotient ring of Z 
modulo the ideal nZ, that is, Z/nZ = Zy. 

We usually consider Z, as consisting of 0, 1,..., — 1 with addition and multi- 
plication modulo n. When there is no confusion we will denote the element [a] in 
Z, just as a. Below we give the addition and multiplication table modulo 5, that is, 
inZ 5. 
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EXAMPLE 2.4.2.1 Addition and Multiplication Tables for Zs 


+0123 4 .0123 4 
00123 4 000000 
112340 10123 4 
223401 202413 
334012 303142 
44012 3 404321 


Notice, for example, that modulo 5, 3-4 = 12 = 2 mod 5 so that in Zs, 3-4 = 2. 
Similarly, 4+ 2 =6=1mod5soinZ5,4+2= 1. 

The question arises as to when the commutative ring Z, is an integral domain and 
when is Z, a field. The answer is when n is a prime and only when n is a prime. 


Theorem 2.4.4 (/) Z,, is an integral domain if and only if n is a prime. 
(2) Zy is a field if and only if n is a prime. 


Proof Since Z, is a commutative ring with an identity for any 7 it will be an integral 
domain if and only if it has no zero divisors. 
Suppose first that n is a prime and suppose that ab = 0 in Z,,. Then in Z we have 


ab=Omodn => nab. 
Since n is prime, by Euclid’s lemma n|a or n|b. In terms of congruences then 
a=Omodn = a=0inZ, orb=Omodn = bD=0inZ, 


Therefore, Z,, is an integral domain if n is prime. 
Suppose n is not prime. Then n = mim with | < m; <n, 1 < mz <n. Then 
n{m,,n {my but n|m mp. Translating this into Z,, we have 


mimz = 0 but m; AO and m2 £ 0. 


Therefore, Z,, is not an integral domain if n is not prime. These prove part (1). 

Since a field is an integral domain, Z, cannot be a field unless n is prime. To 
complete part (2), we must show that if n is prime then Z,, is a field. Suppose n is 
prime, since Z, is a commutative ring with identity to show that its a field we must 
show that each nonzero element has a multiplicative inverse. 

Suppose a € Z,,a # 0. Then in Z we have n { a and hence since n is prime 
(a,n) = 1. Therefore, in Z there exists x, y such that ax + ny = 1. In terms of 
congruences this says that 

ax = 1modn 


or in Zy, 
ax = 1. 


Therefore, a has an inverse in Z, and hence Z,, is a field. 
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The proof of the last theorem actually indicates a method to find the multiplicative 

inverse of an element modulo a prime. Suppose n is a prime and a # 0 in Z,. Use 

the Euclidean algorithm in Z to express | as a linear combination of a and n, that is, 
ax +ny = 1. 


The residue class for x will be the multiplicative inverse of a. 


EXAMPLE 2.4.2.2 Find 67! in Zy). 
Using the Euclidean algorithm 


11=1-6+5 
6=1-5+4+1 
= 1=6-(1-5)=6-(1-(1-1-6) = 1=2-6-1-11. 
Therefore, the inverse of 6 modulo 11 is 2, that is, in Z;, 67! = 2. 
EXAMPLE 2.4.2.3 Solve the linear equation 
6x+3= 1 


inZ ll- 
Using purely formal field algebra, the solution is 


x=6 '(1—3). 


In Z); we have 


1—3=—-2=9and6 * =2 =} x=2+9=18=7. 
Therefore, the solution in Z,; is x = 7. A quick check shows that 
6-74+3=424+3=45=1inZy. 
A linear equation in Z is called a linear congruence modulo 11. We will discuss 
solutions of such congruences in Section 2.5. 


The fact that Z, is a field for p a prime leads to the following nice result known 
as Wilson’s theorem. 


Theorem 2.4.5 (Wilson’s Theorem) If p is a prime then 


(p — 1)! =—-1 mod p. 
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Proof Now (p—1)! = (p—1)(p—2)--- 1. Since Z, isafieldeachx € {1,2,..., p— 
1} has a multiplicative inverse modulo p. Further suppose x = x! in Z p- Then 


x? = 1 which implies (x — 1)(x + 1) = O in Z, and hence either x = 1 or 
x = —1 since Z, is an integral domain. Therefore, in Z, only 1, —1 are their own 
multiplicative inverses. Further —1 = p — 1 since p — 1 = —1 mod p. 


Hence in the product (p—1)(p—2)--- 1 considered in the field Z,, each element is 
paired up with its distinct multiplicative inverse except | and p—1. Further the product 
of each with its inverse is 1. Therefore, in Z, we have (p — 1)(p—2)---1=p-—1. 
Written as a congruence then 


(p — 1)! = p-—1=-I1modp. 


The converse of Wilson’s theorem is also true, that is, if (n — 1)! = —1 modn, 
then n must be a prime. 


Theorem 2.4.6 [fn > 1 is a natural number and 
(n — 1)!=—I1modn 


then n is a prime. 


Proof Suppose (n — 1)! = —1 mod n. If n were composite then n = mk with 
1<m <n-—landl <k <n—1.Ifm €k thenbothm andk are included in (n—1)!. 
It follows that (7 — 1)! is divisible by n so that (n — 1)! = 0 mod n contradicting the 
assertion that (n — 1)! = —1 moda. Ifm =k # 2 then (n — 1)! = 0 mod m which 
is not congruent to —1 mod m. Therefore, n must be prime. If m = k = 2 thenn = 4 
and (n — 1)! = 6 which is not congruent to —1 mod 4. 


2.4.3 Units and the Euler Phi Function 


Ina field F every nonzero element has a multiplicative inverse. If R is a commutative 
ring with an identity, not necessarily a field, then a unit is any element with a 
multiplicative inverse. In this case its inverse is also a unit. For example, in the 
integers Z the only units are +1. The set of units in a commutative ring with identity 
form an abelian group under ring multiplication called the unit group of R. Recall 
that a group G is a set with one operation which is associative, has an identity for that 
operation, and such that each element has an inverse with respect to this operation. 
If the operation is also commutative then G is an abelian group. 


Lemma 2.4.4 /f R is a commutative ring with an identity then the set of units in R 
form an abelian group under ring multiplication. This is called the unit group of R 
denoted U(R). 
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Proof The commutativity and associativity of U(R) follow from the ring properties. 
The identity of U(R) is the multiplicative identity of R while the ring multiplicative 
inverse for each unit is the group inverse. We must show that U(R) is closed under 
ring multiplication. If a € R is a unit we denote its multiplicative inverse by a~!. 


Now suppose a, b € U(R). Then a~!, b~! exist. It follows that 
(ab)(b-'a7!) = a(bb“!)a7! = aa! = 1. 


Hence ab has an inverse, namely b~'a~! (=a~'b~! ina commutative ring) and hence 
ab is also a unit. Therefore, U(R) is closed under ring multiplication. 


The proof of Theorem 2.4.4 actually provides a method to classify the units in any 
Zn. 


Lemma 2.4.5 a € Z,, is a unit if and only if (a,n) = 1. 


Proof Suppose (a, n) = 1. Then there exists x, y € Z such that ax + ny = 1. This 
implies that ax = 1 mod n which in turn implies that ax = 1 in Z, and therefore a 
is a unit. 

Conversely, suppose a is a unit in Z,. Then there is an x € Z, with ax = 1. In 
terms of congruence then 


ax =1modn => n\(ax—-1) = ax-1l=ny = ax-ny=1. 


Therefore, | is a linear combination of a and n and so (a,n) = 1. 


If a is a unit in Z, then a linear equation 
ax+b=c 


can always be solved with a unique solution given by x = a~!(c — b). Determining 
this solution is the same technique as in Z, with p a prime. If a is not a unit the 
situation is more complicated. We will consider this case in Section 2.5. 


EXAMPLE 2.4.3.1 

Solve 5x +4 = 2 in Ze. 

Since (5, 6) = 1, 5isaunitin Z¢. Therefore, x = 5~'!(2—4). Now2—4 = -2 = 4 
in Z¢. Further 5 = —1 s0 5~! = —17! = —1. Then we have 


x=5'2-4 =-14 =-4=2 


Thus the unique solution in Z¢ is x = 2. 


Since an element a is a unit in Z, if and only if (a,n) = 1 it follows that the 
number of units in Z, is equal to the number of positive integers less than or equal 
to n and relatively prime to n. This number is given by the Euler Phi Function, our 
first look at a number theoretical function. 
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Definition 2.4.3. For anyn > 0, 


o(n) = number of integers less than or equal to n and relatively prime to n. 


EXAMPLE 2.4.3.2 
(6) = 2 since among 1, 2, 3, 4, 5, 6 only 1, 5 are relatively prime to 6. 


The following is immediate from our characterization of units: 


Lemma 2.4.6 The number of units in Z,, which is the order of the unit group U (Z,), 


is P(n). 


Definition 2.4.4 Givenn > Oareduced residue system modulo n is a set of integers 
X1,...,X, such that each x; is relatively prime ton, x; 4 x; modn unless i = j and 
if (x,n) = | for some integer x then x = x; mod n for some i. 


Hence areduced residue system is a complete collection of representatives of those 
residue classes of integers relatively prime to n. Hence it is a complete collection of 
units (up to congruence modulo n) in Z,,. It follows that any reduced residue system 
modulo n has ¢(7n) elements. 


EXAMPLE 2.4.3.3 
A reduced residue system modulo 6 would be {1, 5}. 


We now develop a formula for ¢ (7). As is the theme of this book, we first determine 
a formula for prime powers and then paste back together via the fundamental theorem 
of arithmetic. 


Lemma 2.4.7 For any prime p andm > 0, 
m m m—1 m 1 
b(p") = p" — Pp es a 


Proof Recall that if 1 < a < p then either a = p or (a, p) = 1. It follows that 
the positive integers less than p” which are not relatively prime to p” are precisely 


the multiples of p, that is, p,2p,3p,..., p” 'p. All other positive a < p’ are 
relatively prime to p”. Hence the number of positive integers less than p” and 
relatively prime to p” is 

il _ a, 
Lemma 2.4.8 /f (a, b) = 1 then }(ab) = o(a)o(b). 
Proof Let Rg = {x1,...,Xg q} be a reduced residue system modulo a, Ry, = 


{¥1,--+5 Yo~)} be a reduced residue system modulo b, and let 
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S = {ay + bxj;i =1,...,6(0), j =1,...,¢@}. 


We claim that S is a reduced residue system modulo ab. Since S has ¢(a)@(b) 
elements it will follow that (ab) = ¢(a)@(b). 

To show that S is a reduced residue system modulo ab we must show three things: 
first, each x € S is relatively prime to ab; second, the elements of S are distinct; and 
finally, given any integer n with (n, ab) = 1 thenn = s modab for somes € S. 

Let x = ay; + bx;. Then since (x;,a) = 1 and (a,b) = 1 it follows that 
(x, a) = 1. Analogously, (x, b) = 1. Since x is relatively prime to both a and b we 
have (x, ab) = 1. This shows that each element of S is relatively prime to ab. 

Next suppose that 

ay; + bx; = ay, + bx; mod ab. 


Then 
ab|((ay; + bx;) — (ay, + bx1)) => ay; = ay, mod b. 


Since (a,b) = 1 it follows that y; = y,; mod b. But then y; = yz since Rp is a 
reduced residue system. Similarly, x; = x;. This shows that the elements of S are 
distinct modulo ab. 
Finally, suppose (n, ab) = 1. Since (a, b) = | there exist x, y withax + by = I. 
Then 
anx + bny =n. 


Since (x, b) = 1 and (n, b) = 1 it follows that (nx, b) = 1. Therefore, there is an 
s; with nx = s; + tb. In the same manner (ny, a) = | and so there is an r; with 
ny =r; +ua. Then 

a(s; + tb) + b(rj +ua) =n => n=as, + br; + (t+ujab 


=> n=as; + br; mod ab 


and we are done. 


We now give the general formula for ¢(n). 


Theorem 2.4.7 Suppose n = p{' --: p;‘ then 


(n) = (pi! — pi" )(p? — p2')--- (we — p&) =a] [G- 1/pd. 


L 
Proof From the previous lemma, we have 


b(n) = b(pi')b(pS) ++ (PE) 


e,-1 e.—1 


e 2 y—1 7 
Sip =p, 0G, =p =P, 
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= pi'(L— 1/1) +++ py CL = 1/Px) = Py + pg + A = 1/pi) ++ = 1/D x) 


=n|]d-1/p)). 


EXAMPLE 2.4.3.4 
Determine ¢(126). Now 


126 =2-37-7 => (126) = (2)6(3*)6(7) = (1)(3” — 3)(6) = 36. 


Hence there are 36 units in Z126. 


An interesting result with many generalizations which we will look at later is the 
following. 


Theorem 2.4.8 Forn > 1 and ford > 1 


Ye@ =n. 


d\n 


Proof As before we first prove the theorem for prime powers and then paste together 
via the fundamental theorem of arithmetic. 
Suppose that n = p® for pa prime. Then the divisors of n are 1, p, p*,..., p®, SO 


DOO = O() +b (p) +b (p7) +> +4(P°) = 1+ (P— DF (p? = p) +--+ (p= pp"): 
d\n 


Notice that this sum telescopes, that is, 1 + (p — 1) = p, p+ (p? — p) = p* and 
so on. Hence the sum is just p® and the result is proved for a prime power. 

We now do an induction on the number of distinct prime factors of n. The above 
argument shows that the result is true if m has only one distinct prime factor. Assume 
that the result is true whenever an integer has less than k distinct prime factors 
and suppose n = pj'--- p, has k distinct prime factors. Then n = p%c where 
Pp = pi,e = e and c has fewer than k distinct prime factors. By the inductive 


hypothesis, 
>) =e. 
d\c 
Since (c, p) = 1 the divisors of n are all of the form p%d; where d,|c and 
a=0,1,...,e. It follows that 
OD = F600 + Dd) o(pdi) +--+ + D2 b(p*a) 
d\n di|c di|c di|c 


Since (d|, p*) = 1| for any divisor of c this sum equals 
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=>) 6) + >) o(p)od) +--+ + >) b(PMb() 


dlc dlc di|c 


=> ¢)+(P—) > bo) +--+ +(e =P) DO) 


di|c dile ae 
=c+(p- het = Hepwk ( =9 Ye 
As in the case of prime powers this sum telescopes giving a final result 


=pc=n. 


EXAMPLE 2.4.3.5 
Consider n = 10. The divisors are 1, 2,5, 10. Then @(1) = 1, (2) = 1, 6(5) = 
4, ((10) = 4. Then 


$0) + ¢(2)+ (5) + 600) =14+14+44+4= 10. 


2.4.4 Fermat’s Little Theorem and the Order of an Element 


For any positive integer n the unit group U(Z,,) is a finite abelian group. Recall that 
in any group G each element g € G generates a cyclic subgroup consisting of all the 
distinct powers of g. If this cyclic subgroup is finite of order m then m is called the 
order of the element g. Equivalently, the order of an element g € G can be described 
as the least positive power m such that g” = 1. If no such power exists then g has 
infinite order. We denote the order of the group G by |G| and the order of g € G by 
|g|. If the whole group G is finite then each element clearly has finite order. We will 
apply these ideas to the unit group U (Z,,) but first we recall some further facts about 
finite groups. 


Theorem 2.4.9 (Lagrange’s Theorem) Suppose G is a finite group of order n. Then 
the order of any subgroup divides n. In particular, the order of any element divides 
the order of the group. 


If g € G with |G| = n then from Lagrange’s theorem above there is an m with 
g” = 1 and m|n. Hence n = mk and so g” = g"* = (g)* = 1* = 1. Hence in any 
finite group, we have the following: 


Corollary 2.4.1 If G is a finite group of order n and g € G then g" = 1. 


Theorem 2.4.10 Let G be a finite abelian group with |G| =n then 
1. if 91, 92 € G with |gi| = a, |g2| = b then (gig) = 1, 
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2. if 91, g2 € G with |gi| = a, |g2| = b and (a, b) = 1 then |gig2| = ab, 
3. ifn = p\' py ++: pi is the prime factorization of n then 
G=H, x H.x.::--x Hy 


where |H;| = pj’. 


The third part of the last theorem is part of the Fundamental Theorem for 
Finitely Generated Abelian Groups which plays the same role in abelian group 
theory as the fundamental theorem of arithmetic does in number theory. 

With these facts in hand, consider a unit a € Z,. Then a € U(Z,,) and hence a 
has a multiplicative order, that is, there is an integer m witha” = | in Z,. In terms 
of congruences this means that a” = 1 moda. If a € Z, is not a unit then there 
cannot exist a power m > | such that a” = 1 mod n for if such an m existed then 
a”—' would be an inverse for a. 

Lemma 2.4.9 Given n > 0 then for an integer a there exists an integer m such that 
a” = | mod n if and only if (a, n) = 1 or equivalently a is a unit in Zp. 


Definition 2.4.5 Jf (a,n) = | then the order of a modulo n is the least positive 
power m such that a” = | mod n. We will write order (a) or alternatively | < a > | 
or |a| for the order of a. Equivalently, the order of a is the order of a considered as 
an element of the unit group U(Z,). 


Since the order of U(Z,,) = ¢(n) we immediately get that the order of any element 
modulo n must divide ¢(n). 


Lemma 2.4.10 Jf (a,n) = 1 then order(a)|$(n). 


Applying Corollary 2.4.1 to the unit group U(Z,,) we get the following result, 
known as Euler’s theorem. 


Theorem 2.4.11 (Euler’s Theorem) If (a,n) = 1 then 


a®™ =1 modn. 

If n = p a prime then any integer a £ 0 mod p is a unit in Z,. Further @(p) = 
p — 1, and hence we obtain the next corollary which is called Fermat’s theorem. 
(This is often called Fermat’s Little theorem to distinguish it from the result on 
x? + y" — z”.) 


Corollary 2.4.2 If p is a prime and p {a then 


a?-! =1 mod p. 


If (a,n) = 1 and the order of a is exactly ¢(n) then a is called a primitive root 
modulo n. In this case, the unit group is cyclic with a as a generator. Forn = pa 
prime there is always a primitive root. 
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Theorem 2.4.12 Fora prime p there is always an element a of order ¢(p) = p—1, 
that is, a primitive root. Equivalently, the unit group of Z, is always cyclic. 


Proof Since every nonzero element in Z, is a unit, the unit group U (Z,) is precisely 
the multiplicative group of the field Z,,. The fact that U(Z,) is cyclic follows from 
the following more general result whose proof is also given. 


Theorem 2.4.13 Let F be a field. Then any finite subgroup of the multiplicative 
group of F must be cyclic. 


Proof Suppose G C F isa finite multiplicative subgroup of the multiplicative group 
of F. Suppose |G| = n. As has been our general mode of approaching results we 
will prove it for n a power of a prime and then paste the result together via the 
fundamental theorem of arithmetic. 

Suppose n = p* for some k. Then the order of any element in G is p® witha < k. 
Suppose the maximal order is p’ with t < k. Then the Icm of the orders is p’. It 
follows that for every g € G we have g” = 1. Therefore, every g € G is a root of 
the polynomial equation 

x? —1=0. 


However, over a field a polynomial cannot have more roots than its degree. Since G 
has n = p* elements and p' < p*, this is a contradiction. Therefore, the maximal 
order must be p* = n. Therefore, G has an element of order n = p* and hence this 
element generates G and G must be cyclic. 

We now do an induction on the number of distinct prime factors inn = |G|. 
The above argument handles the case where there is only one distinct prime factor. 
Assume the result is true if the order of G has less than k distinct prime factors. 
Suppose n = p}'---p,‘. Then n = p%c where c has less than k distinct prime 
factors. Since G is a finite abelian group with 


IG|=n=p’c = G=4H x K with |A| = p*,|K| =c. 


By the inductive hypothesis, H and K are both cyclic so H has an element h of order 
p* and K has an element k of order c. Since (p*, c) = 1 the element hk has order 
pc =n completing the proof. 


EXAMPLE 2.4.4.1 Determine a primitive root modulo 7. 

This is equivalent to finding a generator for the multiplicative group of Z7. The 
nonzero elements are 1, 2, 3, 4, 5, 6 and we are looking for an element of order 6. 

The table below list these elements and their orders 


x 123456 
Ix] 136362 


Therefore, there are two primitive roots 3 and 5 modulo 7. To see how these were 
determined powers were taken modulo 7 until a value of | was obtained. For example, 
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37 =9=2,37=2-3=6,31=3-6= 18 =4,3°=3-4=12=5, 


3° =3-5=15=1 


EXAMPLE 2.4.4.2 Show that there is no primitive root modulo 15. 

The units in Z5 are {1, 2,4, 7, 8, 11, 13, 14}. Since (15) = 8 we must show that 
there is no element of order 8. The table below gives the units and their respective 
orders. 


x 12478111314 
Ix] 142442 4 2 


Therefore, there is no element of order 8. 


Modulo a prime, there is always a primitive root but other integers can have 
primitive roots also. The fundamental result describing when an integer will have a 
primitive root is the following. We outline the proof in the exercises. 


Theorem 2.4.14 An integer n will have a primitive root modulo n if and only if 
n= 2,4, p*,2p*, 


where p is a prime. 


The order of an element, especially Fermat’s theorem, provides a method for 
primality testing. Primality testing refers to determining for a given integer n 
whether it is prime or not. The simplest primality test is the following. If n were 
composite then n = mim with 1 < m, <n, 1 < my <n. Atleast one of these fac- 
tors must be < ./n. Therefore, check all the integers less than or equal to the ./n. If 
none of these divides n then n is prime. This can be improved using the fundamental 
theorem of arithmetic. If n has a divisor < ./n then it has a prime divisor < ./n. It 
follows that in the above divisibility check, only the primes < ./n need be checked. 

While this method always works it is often impractical for large n and other 
methods must be employed to see if a number is prime. By Fermat’s theorem, if 
were prime and a < n then a”~! = 1 mod n. If a number a is found where this is 
not true then a cannot be prime. We give a trivial example. 


EXAMPLE 2.4.4.3 Determine if 77 is prime. 
If 77 were prime then 2”° = 1 mod 77. Now 


376 = 938-2 = 438 


Now we do computations mod 77 
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4=64=-13 => 4=169=15 = 47 =225=71=-6 


=> 4% = (-6)? = -216 = —62 => 4° = 4?(-62) = —992 = —-68 £1. 


Therefore, 77 is not prime. 


This method can determine if a number v7 is not prime however it cannot determine 
if it is prime. There are numbers n for which a”~! = 1 mod nis true for all (a, n) = 1 
but n is not prime. These are called pseudoprimes. We will discuss primality testing 
further and in more detail in Chapter 5. 


2.4.5 On Cyclic Groups 


In the previous sections, we used some material from abstract algebra to prove results 
in number theory. Here we briefly reverse the procedure to use some number theory 
to develop and prove other ideas from algebra. After we do this we will turn the 
tables back again and use this algebra to give another proof of Theorem 2.4.8 on the 
Euler phi function. 

Recall that a cyclic group G is a group with a a single generator say g. We denote 
a cyclic group G with generator g by < g >. The group G then consists of all the 
powers of g, that is, G = {1, g*!, g**,...}. If G is finite of order n then g” = 1 and 
n is the least positive integer x such that g* = 1. It is then clear that if g” = 1 for 
some power m it must follow that m = 0 mod n, and if g* = g! thenk = 1 mod n. 

Let H = (Z,, +) denote the additive subgroup of Z,. Then H is cyclic of order 
n with generator 1. If G = < g > is also cyclic of order n then since multiplication 
of group elements is done via addition of exponents, it is fairly straightforward that 
the homomorphism f : G > (Z,, +) given by g — 1 is actually an isomorphism 
(see the exercises). Further if G = < g > is cyclic of infinite order then g — 1 gives 
an isomorphism from G to the additive group of Z. 


Lemma 2.4.11 (/) If G is a finite cyclic group of order n then G is isomorphic to 
(Zn, +). In particular all finite cyclic groups of a given order are isomorphic. 
(2) If G is an infinite cyclic group then G is isomorphic to (Z, +). 


Cyclic groups are abelian and hence their subgroups are also abelian. However 
as an almost direct consequence of the division algorithm, we get that any subgroup 
of a cyclic group must be cyclic. 


Lemma 2.4.12 Let G be acyclic group. Then any subgroup of G is also cyclic. 


Proof Suppose G = < g > and H C Gisa subgroup. Since G consists of powers 
of g, H also consists of certain powers of g. Let k be the least positive integer such 
that g* € H. We show that H = < g* >, that is, H is the cyclic subgroup generated 
by g*. This is clearly equivalent to showing that every h € H must be a power of g*. 
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Suppose g' € H. We may assume that t > O and that t > k since k is the least 
positive integer such that g € H.Ift < 0 work with —t. By the division algorithm, 
we then have 

t=qk+rwithr =Oor0<r<k. 


Ifr 4 Othen0 <r <kandr =t—k. Hence g” = gi * = g'g-*. Nowg! € H and 
g* € H and since H is a subgroup it follows that g'~* € H. But then g" € H which 
is a contradiction since 0 < r < k and k is the least power of g in H. Therefore, 
r =Oandt = qk. We then have 


completing the proof. 


Each element of a cyclic group G generates its own cyclic subgroup. The question 
is when does this cyclic subgroup coincide with all of G. In particular, which powers 
g* are generators of G. The answer is purely number theoretic. 


Lemma 2.4.13 (/) Let G = < g > be a finite cyclic group of order n. Then g* with 
k > 0 is a generator of G if and only if (k,n) = 1, that is, k and n are relatively 
prime. 


(2) If G = < g > is an infinite cyclic group then g, g™' 


are the only generators. 
Proof Suppose first that G = < g > is finite cyclic of order n and suppose that 
(k,n) = 1. Then there exists integers x, y such that kx + ny = 1. It follows then 
that 


g= g! = oo = gg” = (gk) (g"). 
But g” = 1 so (g”)” = 1| and therefore 
g = (g*)*. 


Therefore, g is a power of g* and hence every power of g is also a power of g*. The 
whole group g then consists of powers of g* and hence g* is a generator for G. 

Conversely, suppose that g* is also a generator for G. Then there exists a power 
x such that g = (g*)* = g**. Hence kx = 1 modn and so k is a unit mod n which 
implies from the last section that (k,n) = 1. 

Suppose next that G = < g > is infinite cyclic. Then there is no power of g which 
is the identity. Suppose g* is also a generator with k > 1. Then there exists a power 
x such that g = (g*)* = g**. But this implies that g‘*~! = 1 contradicting that no 
power of g is the identity. Hence k = 1. 


Recall that (mn) denotes the number of positive integers less than n which are 
relatively prime to n. This is then the number of generators of a cyclic group of 
order n. 
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Corollary 2.4.3. Let G be a finite cyclic group of order n. Then there are $(n) 
generators for G. 


By Lagrange’s theorem (Theorem 2.4.9) for any finite group the order of a sub- 
group divides the order of a group, that is, if |G| = n and |H| = d with H asubgroup 
of G then d|n. However, the converse in general is not true, that is, if |G| = n and 
d|n there need not be a subgroup of order d. Further if there is a subgroup of order 
d there may or may not be other subgroups of order d. For a finite cyclic group G of 
order n however there is for each d|n a unique subgroup of order d. 


Theorem 2.4.15 Let G be a finite cyclic group of order n. Then for each d|n with 
d > | there exists a unique subgroup H of order d. 


Proof Let G = < g > and |G| = n. Suppose d|n, then n = kd. Consider the 
element g*. Then (g*)¢ = g*? = g” = 1. Further if 0 < t < d then0 < kt < kd 
so kt # 0 mod n and hence g* = (g*)'! # 1. Therefore, d is the least power of g* 
which is the identity and hence g* has order d and generates a cyclic subgroup of 
order d. We must show that this is unique. 

Suppose H = < g' > is another cyclic subgroup of order d (recall that all 
subgroups of G are also cyclic). We may assume that ¢ > 0 and we show that g' is a 
power of g* and hence the subgroups coincide. The proof is essentially the same as 
the proof of Lemma 2.4.12. 

Since H has order d we have g'? = 1 which implies that td = 0 mod n. Since 
n = kd it follows that t > k. Apply the division algorithm 


t=qk+rwithO<r<k. 
Ifr £0 thenO0 <r <kandr=t— qk. Then 
r=t—qk = rd=td—gqkd =O0modn. 


Hence n|rd which is impossible since rd < kd = n. Therefore, r = 0 and t = gk. 
From this 


gag? = G's 


Therefore, g' is a power of g and H = < gf =. 


We now use this result to give an alternate proof of Theorem 2.4.8. 


Theorem 2.4.16 Forn > 1 and ford > | 


Y¢@ =n. 


d|n 


Proof Consider a cyclic group G of order n. For each d|n, d > | there is a unique 
cyclic subgroup H of order d. H then has ¢(d) generators. Each element in G 
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generates its own cyclic subgroup M1, say of order d and hence must be included in 
the @(d) generators of H,. Therefore, 


», o(d) = sum of the numbers of generators of the cyclic subgroups of G. 
d\n 


But this must be the whole group and hence this sum is n. 


2.5 The Solution of Polynomial Congruences Modulo m 


We are interested in solving polynomial congruences mod m. That is, solving poly- 
nomial equations 
f(x) =0modm 


where f(x) is a nonzero polynomial with coefficients in Z,,, the ring of integers 
modulo m. Typical examples might be 


4x? + 3x —2=0mod 12 or 4x + 5 =0 mod 7. 


Of course the solution of such congruences is given in terms of residue classes for 
if x = y mod m then f(x) = f(y) mod m. Hence if x is a solution to a polynomial 
congruence then so is every integer congruent to its modulo m. 

As has been our general procedure, we will reduce the solution of polynomial 
congruences to the solution modulo primes and then try to paste general solutions 
back together via the fundamental theorem of arithmetic. Suppose then that m has 
the prime factorization m = p{' ps? --- p;‘ and that xo is a solution of f (x) = 0 mod 


m. Then xo is also a solution of f(x) = 0 mod Di fori = 1,...,k. Then for each 
i=1,...,k there is a y; with xo = y; mod p;'. Conversely, suppose we are given 
y; with f(y;) = 0 mod p\ fori = 1,..., k then there is a technique based on what 


is called the Chinese remainder theorem, which we will discuss shortly, to piece 
these y; together to get a solution x9 of f(x) = 0 mod m. 

As a first step, we will describe the solution of linear congruences and the Chinese 
remainder theorem and then move on to higher degree congruences. 


2.5.1 Linear Congruences and the Chinese Remainder 
Theorem 


A linear congruence is of the form ax + b = 0 mod m where a € 0 mod m. In this 
section, we will consider solutions of linear congruences. 
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Before proceeding further, we note that solving a polynomial congruence 
f(x) =0modm 
is essentially equivalent to solving a polynomial equation 


f(x) =0 


in the modular ring Z,,,. The solutions of the congruence are precisely the congruence 
classes modulo m. 
For example, the congruence 


2x =4mod5 


is equivalent to the equation 
2x =4 


in Zs. The unique solution in Zs is x = 2, so that the solution of the congruence is x = 
2 mod 5. We will move freely between the two approaches to solving congruences, 
using = for congruence mod m and = for equality in Z,. 

Now we consider the linear congruence ax + b = 0 mod m where a is noncon- 
gruent to 0 mod m. For m = p, p aprime, the solution is immediate and it is unique. 
Since Z,, is a field and a ¢ 0 the element a has an inverse. Therefore, the solution 
in Z, is 

x =a_'(—b) 


and any solution xo must be of the form x9 = a~!(—b) mod p. 


EXAMPLE 2.5.1.1 Solve 3x + 4 = 0 mod 7. 


From the formal field properties, the solution is x = 37! - (—4). In Z7 we have 
—4 = 3 and since 3-5 = 1 mod 7 it follows that 3~! = 5. Therefore, the solution is 
x=5-3=15=1mod7. 


Essentially the same method works if m is not prime but (a, m) = 1. In this case a 
is aunit in Z,, and the unique solution is x = a~!(—b). Consider the same equation 
as in Example 2.5.1.1 but modulo 8, that is 


3x +4=0mod8 => x =37!.(—4) mod8. 


However, modulo 8 we have —4 = 4and3~! = 3sothesolutionisx = 4-3 = 12 =4 
mod 8. 

If (a,m) # 1 the situation becomes more complicated. We have the follow- 
ing theorem which describes the solutions and provides a technique for finding all 
solutions. 
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Theorem 2.5.1 Consider ax + b = 0 mod m with (a,m) = d > 1. Then the 
congruence is solvable if and only if d|b. In this case there are exactly d solutions 
that are given by 


kia | 
x=x —,t=0,1,...,d—- 
ae 


where xg is any solution of the reduced equation 


b 
ee = 0 mod —. 


Proof Let d = (a, m). If xo is a solution then b = —axp mod m or b = —axy + tm 
for some t. Therefore, d|b. Hence if d does not divide b there is no solution. 
Suppose then that d|b. Then (4, 7) = 1 and the reduced congruence 


a b m 
—x + — =0Omod — 
d d d 
has a unique solution (mod 7) say xo. But then xo is also a solution mod m of the 
original congruence. Any integer x congruent to xo modulo “ and hence of the form 
xX =XxXo+ a is also a solution to the reduced congruence. However only d of these are 
incongruent modulo m. Itis easy to check that each of x = xo+ ae t=0,1,...,d-1 
are incongruent modulo m. 


The problem of solving a linear congruence is then reduced to finding a single 
solution of a congruence of the form ax = b mod m with (a, m) = 1. The solution 
is then x = a~'b where a7! is the inverse of a mod m. As explained in Section 2.4.3 
this can be found using the Euclidean algorithm. 

EXAMPLE 2.5.1.2 Solve 26x + 81 = 0 mod 245 

We apply the Euclidean algorithm to both determine if (26, 245) = 1 and if so to 
find the inverse of 26 mod 245 

245 = (9)(26) + 11 
26 = (2)(11) +4 
11=(2)(4)+3 
4=(1)(3) +1. 


Therefore, (245, 26) = 1. Working backward, we express | as a linear combination 
of 26 and 245 


1=4—-(1)G3) =4-01-@@) = @@—-MdD =--- = (66)26)— (45) 
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Hence modulo 245 we have 66 - 26 = 1 and 267! = 66. Therefore, the solution is 


x = (267!)(—81) =} x = (66)(164) = 10824 = 44 mod 245. 


EXAMPLE 2.5.1.3 Solve 78x + 243 = 0 mod 735. 
Using the Euclidean algorithm, we find that (78, 735) = 3 and 3|243. The reduced 
congruence is 


78 243 735 
qe = => 26x + 81 = 0 mod 245. 


From the previous example, the solution to the reduced congruence is x) = 44 
with d = 3. The solutions then mod 735 would be 


ike 735t 
se ee a =>. x= 44+ ——,t=0,1,2 


=> x = 44, 289, 534 mod 735 


The methods above provide techniques for solving linear congruences. Systems 
of linear congruences are handled by the next result which is called the Chinese 
remainder theorem. 


Theorem 2.5.2 (Chinese Remainder Theorem) Suppose that m,,m»,..., M, are k 
positive integers that are relatively prime in pairs. If ay, ..., ax are any integers then 
the simultaneous congruences 


x =a; modm;,i=1,...,k 


have a common solution which is unique modulo m,m 2 ---m x. 


Proof The proof we give not only provides a verification but also provides a technique 
for finding the common solution. 
Let m = mym2---mg. Since the m; are relatively prime in pairs we have 
oa m;) = 1. Therefore, there is a solution x; to the reduced congruence 


m 
—x; = 1modm;. 
Mj 

Further for x; we clearly have 


m : 
—x; = 0 mod m; ifi F /. 
Hy 
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Now let 
k 


m 
Xo = > —Xj qj. 
mM; 


i=l! 


We claim that xp is a solution to the simultaneous congruences and that it is unique 
modulo m. 
Now 


m m 
xo = —xXjaq; = —x;a; modm,; 
= Xi Gi mj i4j j 
sincex “ x; = 0 mod m; ifi € j. It follows then that 
mij 


x= ify mod m; = a; modm; 
mj 
since mi j = 1 mod™m,. Therefore, xo is a common solution. We must show the 
uniqueness part. 
If x; is another common solution then x; = x9 mod m; fori = 1,...,k. There- 
fore, x} = Xo mod m. 
We note that if the integers m; are not relatively prime in pairs there may be no 
solution to the simultaneous congruences. 


EXAMPLE 2.5.1.4 Solve the simultaneous congruences 


x = 6 mod 13 
x = 9 mod 45 
x = 12 mod 17. 


Here m,; = 13, m2 = 45,m3 = 17 som = 13-45-17. We first solve 
(17)(45)x = 1mod13 = x=6 
(13)(07)x =1mod45 => x=11 
(13)(45)x = 1mod17 => x =5. 

To see how these solutions are found let us look at the second one: 


(13)(17) = 1 mod 45 => 221x =1mod45 => 41x = 1 mod 45 
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since 221 = 41 mod 45. We now use the Euclidean algorithm; 
45=1-41+4,41=10-441 = $ 1=(11)(41) —(10)(45) = 417! = 11 mod 45. 
Therefore using these solutions, the common solution is 


13-45-17 13-45-17 13-45-17 
= ——_—— (6)(6) + ——— (11)(9) + ————(5)(12) = 
X0 B (6)(6) + 45 (11)(9) + 7 02) 


=> xo = 27540 + 21879 + 35100 = 84519 = 4959 mod 9945 
=> xo = 4959. 


The Chinese Remainder can also be used to piece together the solution of a single 
linear congruence. 


EXAMPLE 2.5.1.5 Solve 5x + 7 = 0 mod 468. 

Now (468, 5) = 1 so the solution is x = 5~!(—7) mod 468. The prime decompo- 
sition of 468 = 223713. Therefore, the solution can be considered as the simultaneous 
solution of 

x =57!(-7) mod 2? => x =1mod4 


x =57!(—7) mod 3? => x =4mod9 


x =5-'(-7) mod 13 => x =9 mod 13. 


Letting m; = 4, m2 = 9, m3 = 13, and m = 468, then as before we first solve 
(9)(13)x = 1mod4 => x =1mod4 
(4)3)x =1mod9 = x =4mod9 
(4)(9)x =1mod13 = x =4mod 13 

The common solution is 

Xo = (9)03)()C) + (4)3) (4) (4) + (4)(9) (9) (4) = 10201 mod 468 
=> Xo = 373. 
In the previous sections, we noted that for any natural number n, the additive group 


of Z,, and the group of units of Z,, are finite abelian groups. As an easy consequence 
of the Chinese remainder theorem, we have the following result. 


Theorem 2.5.3. For any natural number m let (Zm, +) denote the additive group of 
Zy and let U (Zin) be the group of units of Z,. Letn = njnz---n, be a factorization 
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of n with pairwise relatively prime factors. Then 
(Zn, +) = (Zny, +) X Zags +) X +++ X (Znzs +) 
U(Z,) = U(Zy,) X +++ X U(Zy,). 


We leave the proof to the exercises. 


2.5.2 Higher Degree Congruences 


Now that we have handled linear congruences, we turn to the problem of solving 
higher degree polynomial congruences 


f(x) =Omodm (2.5.3) 
where f(x) is a nonconstant integral polynomial of degree k > 1. Suppose that 
f (x) = ay tayx +--+ + ayx® and g(x) = bo + dix +--+ + dyx* 


where a; = b; modm fori = 1,...,k. Then f(c) = g(c) mod m for any integer 
c and hence the roots of f(x) modulo m are the same as those of g(x) modulo 
m. Therefore, we may assume that in (2.5.2.1) the polynomial f(x) is actually a 
polynomial with coefficients in Z,,. 

As remarked earlier if m has the prime factorization m = p\' ps’ --- pj‘ and xo 
is a solution of f(x) = 0 mod m, then xo is also a solution of f(x) = 0 mod pe 
fori = 1,...,k. Then for eachi = 1,...,k there is y; with x» = y; mod pe 
Conversely, suppose we are given y; with f(y;) = 0 mod p*' fori = 1,...,k then 
the Chinese remainder theorem can be used to patch these y; together to get a solution 
xo of f(x) = 0 mod m. Specifically, 


k 


m 
xo = > Te Yi 
i Pi 


é 
= z; = 1 mod p;'. 


would give a solution where the z; are determined so that 
EXAMPLE 2.5.2.1 Solve x? + 7x + 4 = 0 mod 33. 
Since 33 = 3- 11 we consider x? + 7x +4 = 0 mod 3 and x? + 7x + 4 mod 11. 


First, 


x? +7x+4=0mod3 = x7+x+1=0m0d3 = x =1mod3. 


and this is the only solution. Notice that in Z3 we have (x + 2)? = x?+x+1. Now 
modulo 11 we have 


46 2 Basic Number Theory 
x4+7x4+4=0 => x7? -4444=0 = (x-2)7 =0 = x =2 


is the only solution. Therefore, a solution modulo 33 would be given by the solution 
of the pair of congruences 
x = 1mod3 


x =2mod 11. 


Now Illy = |mod3 = y=2and3y =1mod1il = > y = 4s0 by the 
Chinese remainder theorem the solution modulo 33 is 


x = 11)2)0) + B)A4A(@) = 46 = 13 mod 33 


Hence we have reduced the problem of solving polynomial congruences to the 
problem of solving modulo prime powers. From the algorithm using the Chinese 
remainder theorem, we can further give the total number of solutions. If f(x) is a 
polynomial with coefficients in Z,, we let N ¢(m) denote the number of solutions of 
f(x) = 0 mod m. Then 


Theorem 2.5.4 [f m = pj'p5’---p;' is the prime decomposition of m then 
Np(m) = Np (pi')Np (py) + Np (Py) 


The simplest case of solving modulo a prime power p% is of course when aw = 1. 
Then we are attempting to find solutions within Z,,. Recalling that if p is a prime then 
Z, is a field we can use certain basic properties of equations over fields to further 
simplify the problem. First recalling that in a field, a polynomial of degree n can 
have at most n distinct roots we get: 


Theorem 2.5.5 The polynomial congruence f (x) = 0 mod p, p prime, has at most 
k solutions if the degree of f (x) is k. 


Recall that from Fermat’s theorem x” = x for any x € Z,. This implies that every 
element of Z, is a root of the polynomial x? — x. Suppose that f(x) is a polynomial 
of degree higher than p over Z,. Using the division algorithm for polynomials, we 
then have 


F(x) = q(x)(x? — x) + g(x) where g(x) = 0 or deg(g(x)) < p. 


Since every element of Z, is a solution of x? — x it follows that the solutions of 
f(x) = 0 are precisely the solutions of g(x) = 0. Hence we can always reduce a 
polynomial congruence modulo p to a congruence of degree less than p. 


Theorem 2.5.6 If f(x) has degree higher than p, p prime, then there exists a poly- 
nomial h(x) of degree less than p such that the solutions of f(x) = 0 mod p are 
exactly the solutions of h(x) = 0 mod p. 
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There is no general method to solve a polynomial congruence modulo a prime p. 
However for degree 2 and p an odd prime the quadratic formula holds. First, some 
more definitions. 


Definition 2.5.1 If (a,m) = 1 and and x? = amodm has a solution then a is 
called a quadratic residue mod m. If x? = a mod m has no solution then a is a 
quadratic nonresidue. 


We will talk more about quadratic and nonquadratic residues in the next section. 
However, modulo a prime, we get something special. x” —a is a quadratic polynomial 
and hence in a field it can have at most two solutions. Therefore, 


Lemma 2.5.1 Given (a, p) = | with p a prime. Suppose a is a quadratic residue 
mod p and ai = amod p. Then —xg is the only other solution and if p is odd, xo 
and —Xo are distinct. 


If a is a quadratic residue mod p let ./a denote one of the two solutions to 
x? =a mod p. We then obtain the quadratic formula modulo any odd prime. 


Theorem 2.5.7 If p is an odd prime then the solutions to the quadratic congruence 
ax? + bx +c =0 mod p with a £ 0 mod p, are given by 


—b+ Jb? — 4ac 
2a , 


i= 
In particular, if b? — 4ac is a quadratic nonresidue mod p then ax* + bx +c =0 
has no solutions mod p. 


Proof The development of the quadratic formula is solely dependent on the field 
properties and so can be carried out purely symbolically in Z,,. Suppose 


9 >, b —c 
ax’ + bx +c =O then x* + -x = —. 
a a 


Completing the square on the left side in the usual manner gives 


b b? 
4ae 4a 


54 0 
XxX" + -X + 
a 
where & is defined since 4 4 0 anda” 40inZ p (since p was odd). Then 


Ds b? — dac b Jb? — 4ac 
(x + —) =S ——_— = xt —+ 
2a 4a? 2a 2a 
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where the squareroot has the meaning described above. Finally, 


—b+ Jb? — 4ac 
2a ; 


Ye 


EXAMPLE 2.5.2.2 Solve 3x? + 5x + 1 = 0 mod 7. 
First, we divide through by 3. Since 3-5 = 1 in Z7 then 3~' = 5 and so 


3x? 4+ 5x+1=0 => x7 4+25x4+5=0 => x°4+4x4+5=0. 
Applying the quadratic formula 


—-44+ /16—-406) 34 /7-4 3443 


x= = = 


2 2 2 


Now 3 is a quadratic nonresidue mod 7 so the original congruence has no solutions 
modulo 7. 

For prime power moduli p* with a > | the general idea is to first find solutions 
mod p, if possible, and then move, using the found solutions iteratively to solutions 
mod p’, then solutions mod pr. and so on. There is an algorithm, to handle this 
iterative procedure. We will not discuss this but refer the reader to [NZ] or [N] for 
more on this. 


2.6 Quadratic Reciprocity 


We close this chapter on basic number theory with a discussion of a famous result 
due originally to Gauss, called the law of quadratic reciprocity. There are now 
dozens of proofs of this result in print and the result has far ranging implications 
well beyond what might be expected. Further there are generalizations to algebraic 
number theory as well as applications to problems involving sums of squares. 

Recall from the last section that if x? = a mod n has a solution then a is called a 
quadratic residue mod n.Ifn = p,anodd prime, then there are exactly two solutions 
mod p. Suppose that p,q are distinct odd primes. Then p might be, or might not 
be, a quadratic residue mod gq. Similarly g might be, or might not be, a quadratic 
residue mod p. At first glance, there might seem to be no relationship between these 
two questions. Gauss discovered that there is a quite strong relationship and this is 
the quadratic reciprocity law. In particular, if either of p or g is congruent to | mod 
4 then either both of x? = p mod q and x? = q mod p are solvable or both are 
nonsolvable. If both p and g are congruent to 3 mod 4 then one is solvable and the 
other is not. Before we state the theorem precisely, we introduce some terminology 
and machinery. 
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First, we give a criterion for an integer to be a quadratic residue modulo an odd 
prime. 


Lemma 2.6.1 Jf p is an odd prime and (a, p) = | thena is a quadratic residue mod 
Pp ifand only ifa "> = 1mod p. Ifa is aquadratic nonresidue thena "> =—1mod Dp. 


Proof Suppose (a, p) = 1. We do the computations in the field Z,,. Sincea ¢ 0 then 


PS 


from Fermat’s theorem a?~! = 1 in Z». This implies that ioe _ (a'r +1)=0 
in Z,. Since Z, is a field it has no zero divisors and this implies that either a 7 = 
ora’> = —1.Hence either a’? = 1 mod Dp ora’? = —1 mod p. We show that in 


the former case and only in the former case is a a quadratic residue. 
Suppose that x? = a has a solution say xo in Z p- Then 


p= 


1 p—1 — 
a2 = (x2) Fr =a, Be) 


It follows further that if a’> = —1 there can be no solution. 
Conversely, suppose a’? = 1. Since the multiplicative group of Z p is cyclic (see 


the last section) it follows that there is a g € Z, which generates this cyclic group 
t(p—1) 


and a = g' for some t. Hence g 2 = 1. However, the order of the multiplicative 
group of Z, is p — | and therefore this implies that 


t(p-1 
(P=) = 0mod p—1, 


Therefore, t must be even t = 2k. Hence a = g?* = (g*)? and there is a solution to 


=a. 


To express the quadratic reciprocity law in a succinct manner, we introduce the 
Legendre symbol. 


Definition 2.6.1 Jf p is an odd prime and (a, p) = | then the Legendre symbol 
(a/p) is defined by 
I. (a/p) = 1 ifa is a quadratic residue mod p. 


2. (a/p) = —1 ifa is a quadratic nonresidue mod p. 


Thus the value of the Legendre symbol distinguishes quadratic residues from 
quadratic nonresidues. The next lemma establishes the basic properties of (a/p). 


Lemma 2.6.2 /f p is an odd prime and (a, p) = (b, p) = 1 then 
1, (a*/p) =1, 

Ifa = b mod p then (a/p) = (b/p), 

. (a/p)= a's mod p, 

. (ab/p) = (a/p)(b/P). 


KW dh 
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Proof Parts (1) and (2) are immediate form the definition of the Legendre symbol. 
Part (3) is a direct consequence of Lemma?.6.1. 
To see part (4) notice that (ab)  ~a'rb'> and use part (3). 


From part (4) of this last lemma, we see that to compute (a/p) we can use the 
prime factorization of a and then restrict to (¢/p) where q is a prime distinct from 
p. The quadratic reciprocity law will allow us to compute this for odd primes and 
we will give a seperate result for (2/p). After proving the quadratic reciprocity law, 
we will give examples on how to do this. We now give the theorem. 


Theorem 2.6.1 (Law of Quadratic Reciprocity) If p, q are distinct odd primes then 


(p/ay(q/p) = (DP, 


Alternatively if p, q are distinct odd primes then 
(1) If at least one of p,q is congruent to | mod 4 then 


x? = q mod p and x? = p mod q 


are either both solvable or both unsolvable. 
(2) If both p and q are congruent to 3 mod 4 then one of 


x? = q mod p and x? = p modq 


is solvable and the other is unsolvable. 


Proof The proof we give is based on two lemmas due to Gauss and then a nice 
geometric argument due to Eisenstein. 
Let p,q be distinct odd primes and set h = a Consider the set 


R=({-hA,...,—2,—-1,1,2,...,h}. 


This is reduced residue system mod p and hence every integer a relatively prime to 
p, that is, with (a, p) = 1, is congruent to exactly one element of R. Let 


S = {q,2q,..., hq}. 


Since (p, g) = | any two elements of S are incongruent mod p and therefore each 
element of S is congruent to exactly one element of R. We first need the following 
lemma. 


Lemma 2.6.3 [fn is the number of elements of S congruent mod p to negative 
elements of R then (q/p) = (—1)". 


Proof (Lemma2.6.3) Suppose a, ..., a, are the negative elements of R congruent 
to elements of S and b,,...,b,, with m+n = h the positive elements congruent to 
the remaining elements of S. The product of the elements of S is h!q"” so 
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hq" = a, ---dyb,--- bm mod p. 


Since any two elements of S are incongruent modulo p we cannot have —a; = 
b; for some i, j, for if so then a; + b; = 0 = mq + ng mod p which would 
imply that p|(m + n)q which is impossible since m,n < po . Therefore, —a1,..., 


—an, b,,..., bm give h distinct positive integers all less than or equal to h. Hence 


{—d],...,—@Qn,D1,...,; bm} = {1,..., A}. 
It follows that 
(—1)"a, +++ dnb, +++ by =h! = > (—1)"h!g" = h! mod p. 
However (h!, p) = 1 then 


h p-l 


(—1)"q" =1 mod p => gq" =q? = (-1)" mod p. 


From Lemma 2.6.2, we have 


(q/p) =? mod p => (q/p) = (-1)" mod p. 


We are now going to count (q¢/p) in a different way. Let [x] denote the greatest 
integer less than or equal to x. Notice thatifa,b € Zanda = qb+r withO <r <b 
then [$] = q and so a = [$]b +r. Consider now the sum 


h 


u=>y4 


i=] P 


1. 


M is called a Gauss sum. The next lemma ties this Gauss sum to (q/p). 

Lemma 2.6.4 Let p,q be distinct odd primes and let M be defined as above. Then 
(q/p) = (-1)". 

Proof As explained above for each i we have 


i 
iq = [7 Ip +10 <n <p. 


Let R be as in Lemma2.6.3. If ig is congruent to a negative element a; of R then 
r; = p +a; while if ig is congruent to a positive element b; then r; = b;. Then 
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m 


h h 
a= PIS 1+ Deto+ dm 
i=l i=1 


Further 


Sie ee p=! 
1 
8 


i=1 


Let P = a * and plugging back into our sum over {ig} we get 
m 


h 
Die = Moire S oe i, 
i=l i=l i=l 


However as we saw in the proof of Lemma?.6.3, 


m 


{-a1,...;—@n, B1,..-, Bm} = {1,.--,h} => - Lat daar 
Then 


Pq =pM-+np+ P+2>04; => P(q-l)= (M +n)pt+2> qj. 
i=l i=l 
Since g is odd g — 1 = 0 mod 2 and hence if we take the last sum mod 2 we get that 


M +n = 0 mod 2 


which implies that M, n are both even or both odd. It follows that (—1)” = (—1)". 
From Lemma 2.6.3 we have (q/p) = (—1)" and hence (q/p) = (—1)™ proving the 
second lemma. 


We now interchange the roles of p and q. Letk = it and let N be the Gauss 
sum for q, 
kk, 
ip 
=F] 
ae 


Therefore from Lemma 2.6.4 applied to g, we have (p/q) = (—1)%. Hence 


(p/q)(q/p) = (-I)"(-D™ = (- 1)" *". 


We will show that 


=f gat 
M+N=hk= a> 2.6.1 
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Fig. 2.2. Geometric y 
argument for Quadratic 
Reciprocity (0,q/2) 


(0,0) i (pi2, 0) * 


which will prove the quadratic reciprocity law. 
To show (2.6.1) we will use a lovely geometric argument. Consider the lattice 
points, that is, points with integer coordinates, within the rectangle with corners at 


P 
(0, 0), (+ 0), ¢ 


as pictured in Figure 2.2. 

Let T be the total number of lattice points within the rectangle. We will compute 
T in two different ways. First, notice that T = hk since [5] = h and [$] =k. 

Now consider the number below the diagonal. Since the equation of the diagonal 
isy = pe there are no lattice points on the diagonal. For an integer i, the vertical 
line x = 7 hits the diagonal at the point (7, at ) and hence the number of lattice points 


along the line x = i and below the diagonal is [4]. It follows that the total number 
of lattice points below the diagonal is 


An analogous argument shows that the total number of lattice points above the 
diagonal is N. Therefore, T = M + N. Hence 


M+N=hk 


and the quadratic reciprocity law is proved. 

Before giving some examples we note that by modifying slightly the proof of 
Lemma 2.6.3 we get the following which allows us to compute (2/p) for any odd 
prime p. 
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Theorem 2.6.2 If p is an odd prime, then 
1. (-1/p) = (-1)*= and 
2, 2/p)=(-l)’*. 


Proof The first part (1) follows directly from Lemmas 2.6.1 and 2.6.2 taking a = —1. 

For (2), although we assumed that g was an odd prime in both Lemmas 2.6.3 and 
2.6.4 the construction of the sets R and S and the Gauss sum M only required that 
(q, p) = 1. Now let g = 2. Then from the definition of the Gauss sum M = 0. 


p?-l 


Hence pe =n mod p. Then (2/p) = (-1)"=(-l) TF. 


With the quadratic reciprocity law and Theorem 2.6.2 it is relatively easy to com- 
pute (a/p) for any a. 


EXAMPLE 2.6.1 Determine (870/7). 
The prime factorization of 870 is 870 = 2-3-5-29. Then 


(870/7) = (2/7)3/7)3/71) (29/7). 


First, 
Q/)=(-)** =e =1 
(3/7) = —(7/3) since both are congruent to 3 mod 4 
(7/3) = (1/3) =1 => GB/7)=-1 
(5/7) = (7/5) since 5 = 1 mod 4 
7/)s0))=—)F Sal =|. 6/7]. 
Finally, 


(29/7) = (1/7) = 1. 
Putting these all together 
(870/7) = (2/7). 3/D6/D29/7) = M(-)(-)D@) = 1 
and hence 870 is a quadratic residue mod 7. 
This was just an illustration. For a small prime like 7 it would be easier to reduce 


mod 7 and do it directly. 


870 =2mod7 => (870/7) = (2/7) = 1. 
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2.7 Exercises 


2.1 Verify that the following are rings. Indicate which are commutative and which 
have identities. Which are integral domains? 


(a) The set of rational numbers. 

(b) The set of continuous functions on a closed interval [a, b] under ordinary 
addition and multiplication of functions. 

(c) The set of 2 x 2 matrices with integral entries. 

(d) The set nZ consisting of all integers which are multiples of the fixed 
integer n. 


2.2 (a) Show that in an ordered ring nonzero squares must be positive. Conclude 
that in an ordered ring with identity the multiplicative identity must be positive. 

(b) Show that the complex numbers under the ordinary operations cannot be 
ordered. 


2.3 Show that any ordered ring must be infinite. (Hint: Suppose a > 0 then 
a+a>0,a+a+a > 0 and continue). 


2.4 Prove by induction that there are 2” subsets of a finite set with n elements. 
2.5 Prove that 17+ 2?+---+n? = mT nd 
2.6 Let R be an ordered integral domain which satisfies the inductive property. 
Prove that R is isomorphic to Z. 
(Hint: Let 1 be the multiplicative identity in R. Define 2-1 = 1+ 1 and inductively 
n-l=(n—1)-1+1inR. Define 
R={n-1eER;neEZ} 


and let f : Z > R by f(n) =n- 1. Show first that f is an isomorphism from Z to 
R. Then use the inductive property in R to show that R is all of R.) 


2.7 Prove the remaining parts of Theorem 2.2.1. 


2.8 Find the GCD and LCM of the following pairs of integers and then express 
the GCD as a linear combination 


(a) 78 and 30, 

(b) 175 and 35, 

(c) 380 and 127. 

2.9 Prove that if a = gb +r then (a, b) = (b,r). 


2.10 Prove that if d = (a, b) then 4 and 5 are relatively prime. 
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2.11 Show that if (a, b) = c then (a”, b*) = c*. (Hint: The easiest method is to 
use the fundamental theorem of arithmetic.) 


2.12 Redo Problem 2.8 using the prime decomposition of each integer. 


2.13 Show that an integer is divisible by 3 if and only if the sum of its digits (in 
decimal expansion) is divisible by 3. (Hint: Write out the decimal expansion and take 
everything modulo 3.) 


2.14 Let F bea field and let F [x] denote the ring of polynomials over F’. Prove that 
if f(x), g(x) € F[x] with g(x) ¥ O then there exist unique polynomials g(x), r(x) € 
F [x] such that 


f(x) = q@)g(x) + r(x), rx) = 0 or deg(r(x)) < deg(g(x)). 


This is the division algorithm for polynomials. (Hint: Model the proof on the proof 
for the integers.) 


2.15 Suppose p(x) is a polynomial over F and p(r) = 0. Show that p(x) = 
(x — r)h(x) where h(x) is another polynomial of degree one less. (Use the division 
algorithm.) 


2.16 Let g(x), f(x) € F[x]. Then their greatest common divisor or GCD is the 
monic polynomial d(x) (leading coefficient 1) such that d(x) divides both f(x) and 
g(x) and if d;(x) is any other common divisor of g(x) and f(x) the d;(x) divides 
d(x). Show that the GCD of two polynomials exists and is the monic polynomial of 
least degree which can be expressed as a linear combination of f(x) and g(x). That 
is, 


d(x) = h(x) f(x) + k@)g) 


and d(x) has the least degree of any linear combination of this form. (Hint: Again 
model the proof on the proof for the integers.) 


2.17 Prove Euclid’s lemma for polynomials, that is, if d(x) divides f (x)g(x) and 
(d(x), g(x)) = 1 then d(x) divides f(x). 


2.18 A polynomial p(x) of positive degree over a field F is a prime polynomial 
or irreducible polynomial if it cannot be expressed as a product of two polynomials 
of positive degree over F’. Prove that any nonconstant polynomial f(x) € F[x], 
where F is a field can be decomposed as a product of prime polynomials. Further 
this decomposition is unique except for ordering and unit factors. This is the unique 
factorization theorem for polynomial rings over fields. (Hint: Again model the proof 
on the proof of the fundamental theorem of arithmetic.) 


2.19 Suppose p(x) is a polynomial over F and the degree of p(x) is n. Prove that 
p(x) can have at most n distinct roots over F. 
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2.20 Mimic the results in Problems 2.14 through 2.18 for general Euclidean 
domains (see the definition on p. 21) and then use this to prove 
Theorem 2.3.6. 


2.21 Show that the Gaussian integers Z[i] are Euclidean domain with N(a+bi) = 
a’ + b*. This shows that the Gaussian integers are a unique factorization domain. 


2.22 Prove part (c) of Theorem 2.6.2: If a = b mod n and c = d mod n then 
ac = bd mod n. 


2.23 Verify the remaining ring properties to show that for any positive integer n, 
Zn is a commutative ring with an identity. 


2.24 Find the multiplicative inverse if it exists 
(a) of 13 in Zy7, 

(b) of 17 in Zo, 

(c) of 6 in Z30. 


2.25 Solve the linear congruences 
(a) 4x + 6 = 2 in Z, 

(b) 5x +9 = 12 in Zu, 

(c) 3x + 18 = 27 in Zao. 


2.26 Find ¢(n) for 


(a)n = 117, 
(b) n = 526, 
(c)n = 138. 


2.27 Determine the units and write down the group table for the unit group U (Z,,) 
for 


(a) Zi, 
(b) Za¢6. 


2.28 Verify Theorem 2.4.8 for 
(a)n = 26, 
(b) n = 88. 


2.29 Prove Theorem 2.5.3, that is, for any natural number m let (Z,,, +) denote the 
additive group of Z,, and let U(Z,,) be the group of units of Z,,. Letn = nin +--+ ng 
be a factorization of m with pairwise relatively prime factors. Then 


(Zn, +) = (Zn,, +) x (Zn, +) Kr XK (Zn, 5 +) 


U(Z,) = U(Z,,) Ks MK U(Z,,)- 
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2.30 Prove that if an integer is congruent to 2 modulo 3 then it must have a prime 
factor congruent to 2 modulo 3. 


2.31 Prove that if p is an odd prime then there exist positive integers x, y such 
that p = x? — y?. 


2.32 Prove that if bc is a perfect square for integers b, c and (b, c) = | then both 
b and c are perfect squares. 


2.33 Determine a primitive root modulo 11. 


2.34 We outline a proof of Theorem 2.4.14: An integer n will have a primitive 
root modulo n if and only if 
n= 2,4, p*, 2p* 


where p is a prime. 

(a) Show that if (m,n) = 1 with m > 2,n > 2 then there is no primitive root 
modulo mn. 

(b) Show that there is no primitive root modulo 2* for k > 2. 

(c) Prove that if p is an odd prime then there exists a primitive root a mod p such 
that a?! is not congruent to 1 modulo p?. (Hint: Let a be a primitive root mod p. 
Then a+ p is also a primitive root. Show that either a or (a + p) satisfies the result.) 

(d) Prove that there exists a primitive root modulo p* for any k > 2. (Hint: Let a 
be the primitive root mod p from part (c). Then this is a primitive root mod p* for 
any k > 2.) 

(e) Prove that if a is a primitive root mod p* then, if a is odd, a is also a primitive 
root mod 2p*. If a is even then a + p* is a primitive root modulo 2p*. 


2.35 Use the primality test based on Fermat’s theorem to show that 1053 is not 
prime. 


2.36 If m > 2 show that ¢(m) is even. 
2.37 Prove that (n”) = n@(n) for any positive integer n. 


2.38 Prove that if n > 2 then 


(m,n)=1,0<m<n 


2.39 Prove that if n has k distinct odd factors then 2 |@(n). 


Chapter 3 
The Infinitude of Primes 


3.1 The Infinitude of Primes 


The two most striking characteristics of the sequence of primes are that there 
are many of them but that their density is rather slim. From Euclid’s theorem 
(Theorem 2.3.1) there are infinitely many primes, in fact there are infinitely many in 
any arithmetic sequence of integers. This latter fact was proved by Dirichlet and is 
known as Dirichlet’s Theorem. However, despite the fact the primes are so numer- 
ous, their density among the natural numbers gets slim. As mentioned before if x is 
a natural number and 7(x) represents the number of primes less than or equal to x 
then asymptotically this function behaves as the function ;~. This result is known as 
the prime number theorem. Besides being a startling result, the proof of the prime 
number theorem, done independently by Hadamard and De la Valle Poussin, became 
the genesis for analytic number theory. In this chapter we will discuss various aspects 
of the infinitude of primes. The prime number theorem will be introduced in the next 
chapter. 

As a Starting point we will give an array of proofs of the infinitude of primes: 
some are direct, some involve analysis and some come from quite different directions. 
Hopefully seeing these proofs will both shed some light on the nature of the sequence 
of primes and at the same time show the complexity of this rather straightforward 
result. Included among these will be several simple cases of Dirichlet’s Theorem, 
which we will prove in its entirety in Section 3.3. 


3.1.1 Some Direct Proofs and Variations 


The purpose of this chapter is then to present a wide array of proofs that the set of 
primes is infinite. Each of these other proofs will hopefully shed further light on the 
nature of the primes and the nature of the integers. We first restate the basic theorem 
which was given in the last chapter as Theorem 2.3.1. 


© Springer International Publishing AG 2016 59 
B. Fine and G. Rosenberger, Number Theory, 
DOI 10.1007/978-3-319-43875-7_3 


60 3 The Infinitude of Primes 


Theorem 3.1.1 There are infinitely many primes. 


In the last chapter we gave two proofs of this result, the first of which goes back 
to Euclid. Recall that Euclid’s argument went like this: suppose that there are only 
finitely many primes p;,..., P,. Each of these is positive so we can form the positive 
integer 

N = pipr-+: Pat. 


N has a prime decomposition so in particular there is a prime p which divides N. 
Then 


P\(P1p2°** Pn + 1). 


Since the only primes are assumed to be pj, p2,..., Pn it follows that p = p; for 
some i = 1,...,n. But then p|pip2--- pi --- Px SO p cannot divide p,--- py, + 1 
which is a contradiction. Therefore p is not one of the given primes showing that the 
list of primes must be endless. Notice that in this argument we could just as easily 
have worked with N = p,--- py, — 1. 

We also presented the following variation of Euclid’s argument. Again sup- 
pose that there are only finitely many primes p),..., Py. Certainly n > 2. Let 
P ={pi,.--, Pn}. Divide P into two disjoint nonempty subsets P;, P2. Now con- 
sider the number m = q, + q2 where q; is the product of all the primes from P; and 
q2 is the product of all the primes from P,. Let p be a prime divisor of m. Since 
p € P it follows that p divides either g, or gz but not both. But then p does not 
divide m giving a contradiction. Therefore p is not one of the given primes and the 
number of primes must be infinite. 

We now give some further variations of Euclid’s basic proof. All of these proofs 
do not use analysis. In the next section we prove Theorem 3.1.1 with some analytic 
ideas. These are precursors to both the proof of the prime number theorem and the 
proof of Dirichlet’s theorem. 


Proof (1a) (Using Factorials). Again suppose that pi, ..., Pp, are the only primes 
and let N = p,--- pn. Certainly pj < N for each i. Let g be the smallest prime 
divisor of VN! + 1. Ifg < N then gq certainly divides N! so g cannot divide N! + 1. 
Therefore g > N and hence q > p; fori = 1,...,n. Hence q is not one of the p; 
and the sequence of primes is infinite. 

Notice that the fact that the smallest prime divisor of N!+ | is greater than N 
did not depend on N being a product of primes. Hence this proof can be varied as 
follows. 


Proof (1b) (Again Using Factorials) For each n > | let g, be the smallest prime 
divisor of n! + 1. Exactly as in the previous proof we must have g, > n and hence 
there cannot be finitely many primes. 
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We get another simple variation by using the sum >” 4 and assuming the set of 
primes is finite. In the next section we show that this sum actually diverges which 
also shows that the primes are infinite. 


Proof (2) (Using Sums) As before suppose that p;,..., Py are the only primes and 
let N = pi --- Pn. Set 


n 


2 ane 


ini Pi imi Pi 


aN is an integer so it has a prime divisor which by assumption must be some pj. 


Then p;|aN and pila fori € j. Since N is a product it follows that pil> which 
i J 
is a contradiction. 


The next proof involves the use of the Euler phi function. Recall from Section 2.5 
that for a positive integer n, 


o(n) = number of positive integers x <n with (x,n) = 1. 


For a prime p we have ¢(p) = p — | andif (a, b) = 1 then ¢(ab) = (a) (db). 


Proof (3) (Using the Euler Phi Function) Suppose that p;,..., p, are the only 
primes and let N = p,--- py. Notice that if p; > 2 then d(p;) = p; — 1 > 1. 

If 1 <n < N thenn must have a prime divisor say p; and hence p; is acommon 
divisor of n and N. It follows that (n, N) ¥ 1, that is, they are not relatively prime. 
By definition then we must have ¢(V) = 1. On the other hand 


Q(N) = O(p1 +++ Pn) = O(P1) - O(p2) + On) = (Pi — D+ Gn — YD > 1 


a contradiction. 


The final proof of this first section is somewhat different than the others and 
involves integral polynomials. Let Z[x] denote the set of polynomials with integral 
coefficients and let No = N U {0}. 


Lemma 3.1.1 For each nonconstant polynomial f (x) € Z[x], the set of prime divi- 
sors of the integers { f (k); k € No} is infinite. In particular the total number of primes 
is infinite. 


Proof Suppose that 
F(X) = ay Fax + +++ + amx" 


and assume that for the set { f (k); k € No}, the number of prime divisors which occur 
for some f(k), is finite. Let U = {pi,..., Pn} be this set of prime divisors and let 
D= p,--: Pn. Without loss of generality suppose ag 4 0. Choose an integer f so 
that p; does not divide f (0) = ao for any i. Since the p; are the only primes we must 
have ao|D‘, that is, D’ = ajb for some b € Z. For k > 1 we have 
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F(kD™) = Di ajki DY! + ay = ag ajkib% ay?! +1) = M. 


i=l j=l 


For k large enough the integer M must have a prime divisor p which does not divide 
agb and hence p ¢ U a contradiction. 


3.1.2 Some Analytic Proofs and Variations 


Both the proof of the prime number theorem and the proof of Dirichlet’s theorem 
depend heavily on the use of analysis—both real and complex. The introduction of 
analytic methods into number theory can be traced back basically to the following 
two results of Euler which also imply that the sequence of primes is infinite. 


Theorem 3.1.2 The sum 


- 


p prime 
diverges. In particular the sequence of primes is infinite. 


Proof Clearly, if the series 


= 


Pp prime P 
diverges, then there must be infinitely many primes, for otherwise this would be a 
finite sum. 

We present two proofs that this sum diverges. The first is direct while the second 
introduces the Riemann zeta function which will be crucial in investigations of the 
density of primes. 

Let pi,..., Pe,-.. be the sequence of primes in increasing order which at this 
point may or not be infinite. We first need the following fact: 


Lemma 3.1.2 /f pi,..., pxe,... is the sequence of primes in increasing order then 
Pn < 22" forall n and py < 22"' foralln > 1. 


Proof (Lemma3.1.2) By induction. p; = 2 < 2! so its true for n = 1. Further no 
other prime is even so py # 2” if k= 1, Suppose then that py, < 2?" and consider 
Px+1. Now, as in Euclid’s proof of the infinitude of primes, K = p,--- px + 1 must 
have a prime divisor which is not one of pi, ..., px. Hence 


tet SRS pepe =e Sie7. 


Therefore the assumption is true for all 7 by induction. 
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Proof Now we continue the proof of Theorem 3.1.2. Assume that 


> = 
a =e 


p prime 


converges. Note that we are not assuming here that there are infinitely many primes. 
If there are only finitely many then this is a finite sum. Since the series converges 
and the p; are increasing there must be an N such that 


= i 4 
> oe 
ani 


Fix this value N, and let Qy(x), for any natural number x, be the number of 
positive integers less than or equal to x which are not divisible by any of the primes 
PN+1, PN+2, ---. For a given prime p the number of integers n < x and divisible by 
p is smaller than Then it follows that for any integer x, 


x — On(x) < 


x 
Pn+i = Pn+2 2 


since we assumed that 


Therefore 5 < Qy(x). On the other hand if n < x and n is not divisible by any 
of Pyai, Pn+2,-.. thenn = nim where m is squarefree. Hence m = 2°3° .-- pea 
where each e; = 0 or 1. Hence there are at most 2" choices for m. Further there are 


at most ./x choices for . It follows then that 
x N 
a: On(x) < 26 /x. 


Since N is fixed this is a contradiction for x large enough and hence the sum 


a 


Pp prime 


1 
p 


diverges. 


We now give a second proof of Theorem 3.1.2 which introduces the ideas of the 
Riemann Zeta Function and Euler products which are fundamental in some of our 
further discussions. 


Proof (of Theorem3.1.2) For a real variable s > 1 we define the Riemann Zeta 
Function by 
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= it 
9) = Daa 


From the classical p-series test this will converge if s > 1 and hence will define a 
function. When we discuss the prime number theorem in the next chapter we will 
extend this function to complex variables. Since }°°~_, i diverges it follows that as 
s — 1* the sum ¢(s) will diverge. From the fundamental theorem of arithmetic each 
n can be expressed as a product of primes and hence the zeta function can be written 
as the following product 


cs)= [] a a 
p prime 
However we have the geometric series converging so that 
1 1 
ps ps ps eee 1= po 


Therefore 


co= J] (—,). 


Ss 
p prime P 


These last two products are called Euler products after Euler who first used them 
in his investigations. 

Now if the sequence of primes was finite then the Euler product would be a finite 
number and hence ¢(s) would always converge. However as we pointed out ¢(s) 
diverges if s > 1* and hence the sequence of primes is infinite. 

For the second proof of Theorem 3.1.2 consider the inequality 


1 
a 


which holds if 0 < x < 1 (see the exercises). It follows that for 0 < x < 5, 


bo 


oo 


Then using the Euler product representation of ¢(s) and taking logarithms, 


In(C(s)) = >? In = a ee 


p prime p prime 
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Lt pantne ; were convergent then 2 >", p-* <2", p”! foralls > 1 and it would 
follow that ¢(s) would not diverge if s > 1+ a contradiction. Therefore the sum 
diverges. 


The final results in this section give lower bounds on 7(x) the number of primes 
less than or equal to x. These lower bounds further imply the infinitude of primes. 


Theorem 3.1.3. For any natural number x > 2 we have 
a(x) > InInx. 


Proof Let p1,..., Dx, ... be the sequence of primes in increasing order. Recall that 
Pn < 2?" foralln > 1. Fora given x choose ak such that 


2" <x< 2 
Therefore since py < 22" we have 
k<7(27') < n(x). 


From x < 22° < e® it follows that 


InInx <k < 7(x). 


Using the Fundamental Theorem of Arithmetic we can arrive at a separate but 
similar lower bound. 


Theorem 3.1.4 For any natural number x > 21 we have 


Inx 
2InInx 


T(x) > 


Proof For fixed x let p; run over all the primes less than or equal to x. Then from 
the Fundamental Theorem of Arithmetic the number of integral solutions to the 


inequality 
[]> ss 
Pi 


for e; > Ois precisely x. On the other hand the number of solutions is the product of 
the number of choices for each e;. Since for a solution P; < x we have 


Inx Inx 9 
<1+—-— < (nx) 


pel 
SS eg, In2 
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for x > 20. Therefore 


i 


Corollary 3.1.1 (x) — oo as x — oo. In particular the sequence of primes is 
infinite. 


Proof From Theorem3.1.3 we have m(x) > InInx for x > 2. The latter sequence 
becomes infinite with x. Similarly from Theorem3.1.4 we have a(x) > = for 
x > 21 and this latter sequence also becomes infinite with x. 


3.1.3 The Fermat and Mersenne Numbers 


In the next several subsections we will examine primes in relation to certain special 
sequences of integers. Although not directly related, this path will lead ultimately to 
Dirichlet’s Theorem. 

The first such sequence we consider are called the Fermat numbers. 


Definition 3.1.1 The Fermat numbers are the sequence (F,,) of positive integers 
defined by 
Se 4 ina 183k 


If a particular F,, is prime it is called a Fermat prime. 


Fermat believed that all the numbers in this sequence were primes. In fact 
F,, Fy, F3, F, are all prime but F5 is composite and divisible by 641 (see exer- 
cises). It is still an open question whether or not there are infinitely many Fermat 
primes. It has been conjectured that there are only finitely many. On the other hand, 
if a number of the form 2 + 1 is a prime for some integer k then it must be a Fermat 
prime. 


Theorem 3.1.5 [fa > 2 anda" + 1 is a prime then a is even and n = 2” for some 
nonnegative integer m. In particular if p = 2* +1 is a prime then k = 2” for some 
n and p is a Fermat prime. 


Proof Vf a is odd then a” + 1 is even and hence not a prime. Suppose then that a is 
even and n = k/ with k odd and k > 3. Then 


kl 
etd eo gk 
a+] 


3.1 The Infinitude of Primes 67 


Therefore a! + 1 divides a* + 1 if k > 3. Hence if a” + 1 is a prime we must have 
n=2", O 


We now use the Fermat numbers to get another proof of the infinitude of primes. 
We first need the following. 


Lemma 3.1.3 Let (F,,) be the sequence of Fermat numbers. Then ifm # n we have 
(Fi, Fin) = 1. 


Proof Suppose that n > m and suppose that d|F,, d| F,,. Then 


F,-2 2-1 
FE, a 92" ae 1 


- (22")2""—1 _ ay eal 


Therefore F,,,|F, — 2 and hence d|F,, — 2. Since d|F,, it follows that d|2. But d 4 2 
since both F,, and F,,, are odd. 


This now yields another proof of the infinitude of primes. Since the members of 
the infinite sequence (F,,) are pairwise coprime and each F;, must have at least one 
prime divisor it follows directly that the number of primes must be infinite. 

We can also get the following variation of this method. Suppose a € N. Define 
the sequence A, = a” + 1. Then it can be proved that (see exercises) 

(1) Ifn > m > 1 then (a”” + 1)|(a” — 1) 

(2) (An, Am) = 1 with n # m if a is even and (A,, A») = 2 withn A m if a is 
odd. 

Then the same proof as used with the Fermat numbers goes through. In fact 
given any infinite integer sequence (a,) with (a;,a;) = 1 for i A j will yield a 
similar proof. As an example start with (m,n) = | and let ag = m + n. Then define 
inductively 

Ak = ar —ma+m. 


Then it can be proved that (a;, a;) = 1 ifi ¢ j and this sequence can be used in the 
same proof. 
The second sequence we consider is called the sequence of Mersenne numbers. 


Definition 3.1.2 The Mersenne numbers are the sequence (M,,) of positive integers 
defined by 
M, = 2" -—1,n=1,2,3,... 


If a particular M,, is prime it is called a Mersenne prime. 


The Mersenne numbers were introduced by the French clergyman and mathe- 
matician M. Mersenne who showed that if M,, is a prime than n must be a prime and 
claimed then that M,, is a prime for 


n = 2,3,5,7, 13, 17, 19, 31, 67, 127, 257 
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and composite for all others. It is now known that M67 and M)s57 are not primes 
while M6; and Mgpo are primes. Further M, is prime for several large exponents and 
the search for larger and larger primes generally revolves around Mersenne primes. 
As in the case of the Fermat primes it is still an open question as to whether or not 
there are infinitely many Mersenne primes. However for the Mersenne primes it is 
conjectured that there are infinitely many. As of May 2013 there were 48 known 
Mersenne primes, the largest of which is M6972593. Further information on the search 
for larger Mersenne primes can be found at the internet site www.mersenne.org. 


Theorem 3.1.6 Suppose a,n are positive integers. If a" — 1 is prime then a = 2 
and n is prime. In particular if a Mersenne number M,, is a Mersenne prime then n 
is prime. 


Proof Assume a > 3. Then (a — 1)|(a” — 1). Therefore if a” — 1 is prime we must 
have a = 2. Ifn = kl with2 < k,] <n then 


api =i), 


Hence if 2” — | is prime n must be prime. 


As is the theme of this chapter we will now use the Mersenne numbers to derive 
the infinitude of primes. 


Lemma 3.1.4 For any pair of Mersenne numbers M,, Mn we have 


(Mn, Mn) = (2” —1, or l= Qimn) <4. 


Proof This is certainly correct ifm =n orn = l orm = 1.Assumethatm >n > 1. 
From the Euclidean algorithm applied to m, n we have 


m=ngt+ri 

n=rigqitr2 

ls—2 = Vs—1Qs-1 +Ts 
ls—| = Vs 


andr, = (m,n). 
It follows then that 


gm a 240+" pe 0" (240" _ 1) + (2" _ 1) 


2" — 1 = 27202" — 1) 4+ (2-1) 
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21 — 1 = 2 — 1 QMED 4... 4 0), 


This yields 
(2’s — 1)|(2"-! — 1) and (2’”* — 1)|(2”*? — 1) 
since also 
eS el = 1 eae se I), 
Finally 


(2” — 1)|(2" — 1) and (2” — 1)|(2” — 1). 


Suppose now that d = (2” — 1, 2” — 1). It follows that d|(2" — 1) fori = 1,...,5. 
Therefore d|(2"" — 1) = 2” — 1, 


Now let P = {p,..., Pn} be a finite set of primes with 


2= pi < pr <-+++ < Pn. 


Then 
(2?" — 1,2?) —1) = (2%?) — 1) = Lifi = J. 


For i = 1,..., each 2”! — | is odd and hence they have pairwise different odd 
prime divisors. Since there are only n — 1 odd primes in P it follows that there must 
be a prime number not in P. 


The Mersenne numbers are closely tied to what are called the perfect numbers. 
A natural number n is a perfect number if it is equal to the sum of its proper divisors. 
That is 
n= = d. 
d|n,d>1,dA4n 


For example the number 6 is perfect since its proper divisors are 1, 2, 3 which add 
up to 6. 
If we denote by o(n) the sum of all positive divisors of n, that is 


o(n) = > d 


d\n,d>1 


then o(n) = 2n if and only if n is perfect. The following result, the first part is from 
Euclid and the second part due to Euler, gives the relation between perfect numbers 
and Mersenne primes. 


Theorem 3.1.7 Let (M,,) be the sequence of Mersenne numbers. Then 
(1) (Euclid) If M, = 2? — 1 is a Mersenne prime then 


n = 2?-!(2P — 1) 
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is a perfect number. 
(2) (Euler) Ifn > 2 is a perfect number and even then 


n = 2?-!(2? — 1) 
and M, = 2? — 1 is a Mersenne prime. 
Proof (1) Suppose 2” — 1 = q isa prime and let n = 2?~'(2? — 1). Then 
a(n) =14+24+---+2? 14+ qt2qt+--- +2? "'q 
=(qt)C+24+---+2?71) = 2702? — 1) = 2027-12? — 1)) = 2n. 
Therefore o(n) = 2n and hence n is a perfect number. 

(2) Suppose n is a perfect number. Let n = 2'u with u odd. The divisors of n are 
of the form 2°m with 0 < s < t and m|u. Consider s fixed and consider the divisors 
2°m. Their contribution to the sum o(7) is equal to 2°a(u). It follows that 

o(n) = (L424+---+2)o(u) = (2't! — Io). 
Since n is perfect we have o(n) = 2n and hence 
tty = (2'*! — ou). 
Since u is odd from Euclid’s lemma we get 
o(u) = 2'*!a andu = (2'*! — Da 
for some natural number a. The number u has two different divisors a and (2'+! — 
l)a > a. Their sum is 2't!a = o(u). This is possible only if u = (2'*! — 1)a has 


no other divisors, that is if a@ = 1 and 2't+! — 1 is prime. It follows that ¢ + 1 must be 
a prime, 2'*! — | is a Mersenne prime and n has the required form. 


This completely characterizes in terms of Mersenne primes the even perfect num- 
bers. It is still an open question whether there is an odd perfect number. 

Finally we mention a result called the Lucas—Lehmer Test which is useful in 
testing for large Mersenne primes. We will give this result again, as well as its proof, 
in Chapter 5 on primality testing. 


Theorem 3.1.8 Let p be an odd prime and define the sequence (S,) inductively by 
S; =4 and S, = S?_, —2. 


Then the Mersenne number M, = 2? — | is a Mersenne prime if and only if M, 
divides Sp_1. 
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3.1.4 The Fibonacci Numbers and the Golden Section 


The next sequence of integers that we consider is called the Fibonacci numbers. 
This sequence has many remarkable properties, some of which we will explore in 
this section. The interest in this sequence, both by professional mathematicians and 
by amateurs, has been almost mystical and there is a whole journal The Fibonacci 
Quarterly devoted to results surrounding these numbers. In addition this sequence 
has an intricate tie to a number called the golden section or golden ratio which has 
tremendous and varied applications in geometry. 


Definition 3.1.3. The Fibonacci numbers are the sequence (f,,) defined recursively 
by fi = 1, fo = 1 and then 


Sn _ Sn-1 + Sn—-2 for n za 3: 


Hence the first few terms of the sequence are 
1,1,2,3,5,8, 13, 21,... 


This sequence was introduced by the Italian mathematician Leonardo Pisano or 
Leonardo of Pisa. He is better known as Fibonacci, son of Bonnaccio, via a problem 
in his book Liber Abaci published in 1202. In this problem he asked the following. 


How many pairs of rabbits will be produced in a year, beginning with a single 
pair, if in every month each pair bears a new pair which becomes productive from 
the second month on. 


This leads to the following scheme with A being a productive pair and B becomes 
a productive pair from the second month on (Figure 3.1). 
Computing, we then get the following table 


Fig. 3.1 Scheme for A 
Fibonacci’s Rabbit Problem \ 
A B 
\ 
\ 
\ 
f : \ 
A B A 
\ N 
~ & 
\ \ 
‘ \ 
A B A A B 
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No. of A No. of B Total Number 
1 0 1 


1 1 2 
2 1 3 
3 2 5 


and so on, which produces the recursive formula giving the Fibonacci numbers. 
An alternative formulation of the Fibonacci numbers can be given by the next 
theorem. 


Theorem 3.1.9 Let P, = P; =1 and for n>3 let P, be the number of 0-1 
sequences of length n — 2 with no repeating 1’s. Then P,, = f, for all n. 


Proof This is clear for n = 3. Then forn > 3 let g, be the number of 0-1 sequences 
of length n — 2 with no repeating 1’s and ending in 0 and let h,, be the number of 
0-1 sequences of length n — 2 with no repeating 1’s and ending in 1. For each such 
sequence of length n — 2 ending in 0 there are 2 new sequences of length n — | while 
there is only one new sequence for those ending in 1. Therefore 


Qn = In-1 + h,_, andh, = Qn-1 


and 


The result follows easily from this. 


The properties of the Fibonacci numbers are intricately tied to the number 


i eas 
aL 


This number is called the golden section or golden ratio and arises naturally in many 
geometric applications. Before continuing with the Fibonacci numbers we digress 
and discuss the golden section and its ties to geometry. 

To define a, consider a line segment AB, and let the point P be located so that it 
divides the line segment in extreme to mean ratio. By this we mean that 


|AP| _ |ABl 
|PB| |AP|’ 


If we let PB have length | as in Figure3.2 then the length of AP is the golden 
section a. 
To see that the value of a is Ls we have the ratio 


Qa a+ 1 


1 a 
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a l 
rr 


A ad B 


Fig. 3.2 Extreme to mean Ratio 


Fig. 3.3. Golden Rectangle B a Cc 


This then gives the quadratic equation 


wv —a—-1=0. 

The two solutions are ee and since the golden ratio is positive we get that a = a 
as desired. 

If we have arectangle ABC D with |BC| = aand|CD| = | as in Figure 3.3 then 
this is a golden rectangle. 

The classical Greeks regarded the golden rectangle as the most pleasing rectan- 
gular shape and built many of their temple fronts with this format. 

If we begin with a golden rectangle ABCD as in Figure3.3 and remove the 
square ABE F, the remaining rectangle EC DF is again a golden rectangle. To see 
this suppose that |BC| = a and |CD| = 1. Then 


|EC|=a—1 => |CH|=a-1 


and then 
|DC| _ |DC| _ 1 


|IEC| |CH| a-—1 


1 145 
14+V5_y 2 — 
2 


This process of removing squares can be continued and each time we get a smaller 
golden rectangle as in Figure 3.4. Starting with A if opposite corners are connected 
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Fig. 3.4 Golden Spiral B E Cc 
I 
G H 

A F D 
Fig. 3.5 Golden Section 
Relative to an Inscribed Y ie. = 
Square 

A & x B 


by circular arcs with radius the side of the given square we get a logarithmic spiral 
called the golden spiral. Its equation in polar coordinates is r = an. 

The golden section is of course an irrational number. However it can be constructed 
very easily with ruler and compass. To do this, start with a line segment AB of length 
1, and a line segment AF of length ; and orthogonal to AB. Then the segment E'B 


has length ,/1 + ; = o. Adjoin to E B a line segment BC of length 5 and EC has 
length a. 

The golden section arises naturally in many geometric applications. We describe 
several of these. First, consider a square inscribed in a semicircle of radius R as 
pictured in Figure 3.5. 

Suppose |A B| = r and let x be the length of the side of the inscribed square. Then 
r = R+ 5. We then have from the Pythagorean theorem 


: R 
x= —=R. 
J5 
But then 1 I 
|AB| =r = RU + —=) andr —x = R(I — —~). 
J5 J5 


Since 
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Fig. 3.6 Regular Decagon 
Inscribed in a Circle 


I 2 
a+) 4 
2 I 
a Gy) 
we have 
ro x 
x r—-x 


that is the point C divides the line segment AB by the golden ratio. 

Next consider a regular decagon inscribed in a circle of radius R. A side So has 
length 2R sin(;5) (Figure 3.6). 

Using the trigonometric identities 


sin *) = 2sin(= 9) 2°85 ) 


a4 er T 
cos Tan sin (Go? 


and 


sina =) = = cos( =) 


we get that 


4sin(—-)(1 — 2sin2(-)) = 1 
10 i 
Therefore the value of sin(;5) is a solution of the polynomial equation 
4x(1 — 2x”) = 1. 


Since sin(;5) > 0 and sin(75) 4 5 we obtain 
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Fig. 3.7 Regular Pentagon E 


ee le ae 1 
ih a Oey 


where a is the golden section. Therefore 


R 


a-—l 


= Ra. 


|Sio] = 2R sin(—) 
— sin( —) = 
10 10 


Hence the side of a regular decagon inscribed in a circle is the bigger section of the 
radius divided by the golden section. 
Using this connection it is easy to construct regular decagons and regular pen- 
tagons with ruler and compass. 
Next consider a regular pentagon. Its diagonals describe a regular star like the 
pentagon in Figure 3.7. 
The angle ZAFD is “ while the angle ZADF is a From the law of sines we 
have 
|AD| _ sin($G) _ 
|AF| — sin(24) 


) al 
cos(—) = 
io 


since 
) an d_ Asi a D 1 
cos = sin = =a, 
10 5 a? 
Because |AF'| = |AC| we have iar =v and hence the point C divides the line 


segment AD by the golden ratio. 


Finally consider a rectangle as in Figure 3.8. 

We wish to find the points P and Q so that the triangles APAQ and AQBC and 
AC DP all have equal area. 

If the triangles do have equal area we have the identities 


xw=ywtga=zZaty) = xwa=ywtyz=xzt+ yz. 
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Fig. 3.8 Rectangle A x Q y B 
w 
W+Z 
F 
z 
D x+y e 


This implies that 
w x 
ywr=xz => —=-. 
ZY 


Then from xw = y(w + z) we get 


Zz 1 1 
a eS, Ce ee (ey 
y w w » ~ 
z y 
This means that 
Xn Xx w 
(-) 1=0 => -=—=a. 
y Zz 


Hence the solution to the equal area problem is precisely the points P and Q which 
divide the sides AB and AD in the golden ratio. 


We now return to the Fibonacci numbers and first show the tie to the golden 
section. 


Theorem 3.1.10 (Binet Formula) Let (f,,) be the Fibonacci sequence, let a = ee 
be the golden section and let 3 = —a~! = 1/5 Then forn > 1, 
q’ — oe 
fr = . 
a= p 


Proof The golden section a together with ( as defined in the statement of the theorem 
is the zeros of the polynomial 


x —x—-1=0. 


It follows that 
qt? 3 itl 4 qi 
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and 
frag 4 A tora 1 


Further a — 6 = J/5 £ 0. We then have 


a—p 
f= ; 
a—p 
a2 = ion 
tr = =a Se B = ] 
a—p 
and ies is 
qQ” _ Br q? — Br 
n = aE = Jn n 
fn+2 a=8 e=p frit sh 
forn > 3. 
Corollary 3.1.2 /f f, and a are as above then 
i 1 
i ee ——_— 
noo fy, 1+ a 
Proof From the Binet formula 
foti 7 qitl _ grt} _ 1 = am 


tn at — fF orld = (8) 


Since [2 < | this clearly goes to a as n > oo. Further by rearranging it is easily 
seen that f i 

Jn+l oes 
aa ua fu" 


We now list a collection of properties of the Fibonacci numbers. In addition to 
showing the rich theory of these numbers they will lead us to two more proofs of 
the infinitude of primes. Throughout all the remainder of this section the f,, are the 
Fibonacci numbers and a is the golden section. 


Lemma 3.1.5 f,+ ft+---+ fir =fr-—ln>l. 


Proof This is correct for n = | andn = 2. For n > 3 we have 


fitter t+ frit fr = fosi—-lt fe = foso— 1. 
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The next four results are again straightforward inductions, the first on directly 
and the second fixing n and inducting on m. We leave the details to the exercises. 


Lemma 3.1.6 fifosi= fi t+ft+-e-+f2 forn> 1. 
Lemma 3.1.7 f? — fr—ifnai = (-1)" forn > 1. 
Lemma 3.1.8 faim = fa-ifin + fafnti, 1 = l where fo = 0. 


Lemma 3.1.9 (a) Ifr, s are positive integers then r dividing s implies that f, divides 
fs. Conversely ifm > 2 then if fn\ fim it follows that n|m. 

(b) (fn, fm) = fomn)- That is the gcd of f, and fim is the fibonacci number indexed 
by the gcd of the (m, n) term in the Fibonacci sequence. In particular f, and fin are 
relatively prime if m and n are relatively prime. 


Proof (a) Recall that aG = —1 and a+ (3 = 1. We then have 
q's — prs 
i= 
a—p 
_ as — Be 
Tae 


Hence if r|s then f,| fs. 
We need part (b) in order to prove the converse. Suppose that m > n. Then by the 
Euclidean algorithm we have r,; = (m,n) where 


(als he iii 6 Bosak as B°-%)s 4 pes, 


m=ng+r, withhO <r) <n 


n=rigi tro withO<1r<r 


M—-2=M-1G—-1 + withO <7 < r;-1 
Yyr-1 = Trt. 


Then applying this to the corresponding Fibonacci numbers we have from 
Lemma 3.1.8: 


(fn. Im) = (fagotns tn) = (fnao—1 Fi — Sago tn+i> tn) 
= Prap Sis Tn) = (Seis tn) 


because f;,| fig from the first part of part (a) and (fing), fngo—1) = 1. (Clearly two 
neighboring Fibonacci numbers are relatively prime.) 
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Analogously 


(fry> Tn) = (fr i) See SS (fi,> Si) = Si, 


since f,,| f-,_,- This completes the proof of part (b). 
We now consider the second half of part (a). Suppose that m > 2 and that f,| fin. 
Then 


Stn = (fn: fm) = Fimn) 


from part (b). It follows that n|m since m > 2 and f, < fs if2<r<-s. 


Lemma 3.1.10 (a) fox = fe( fev + fei) = a4 = (nee 
(b) fx =>i (Nf where (‘) is the binomial coefficient. 


i=0 V7 


(c) faz = +, (") where [x] is the greatest integer function. 


U 


Proof These are all applications of the Binet formula. For part (a) we have 


k-1 _ Qk-1 k+l k+l 
fa = filo’ + BY = fi(— B —s B ) 


= fi(fr-1+ feed) = fear — fe 
For part (b) apply the Binet formula to obtain 
k 


k 1 : k i i 
> (is = ape (‘Joo — 6')) 


i=0 


1 / 1 
= —— ((1 + a) — (1 + 6)*) = —— (0 — 8) = fax. 
a-pB a—p 
Finally for part (c), it clearly holds for 0 < n < 2. Suppose now n > 2 and we 


proceed by induction. Then 


(4) 


fur = fit fers ("~~ ')+ (ene: 
i=0 


i=0 


We first consider the case where n = 2m withm > 1. Then [4] =m—-1l= [=] 
and hence from above 


m—1 E 
2m—-1-i1 
fra = Dl ) 


i=0 


mat (2m —1—(i +1) 
+> ( @+1)-1 ) 


i=0 
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_ ee ; 7 ew = "4 +o" = ' in ue ‘) 


EC) 


= 


completing the even case. 
Now suppose n is odd son = 2m + 1 with m > 1. Then 


a ee ree 
7 ee ae Be eal rs 
and hence ; 
“ (2m-i\ a (2m-1-i 
a= : Je Gers) 


(C=C) 


finishing the odd case and part (c). 


The next result and corollary deal with the relationship between the Fibonacci 
numbers and the primes. This will lead directly to another proof that there are infi- 
nitely many primes. 


Theorem 3.1.11 Let p be a prime. Then 
(1) p\fp if p =S and p\fp-1 or pl fori if Pp AS. 
(2) pl fps if p = 2. 
(3) p\ fp-1 if p is congruent to +1 modulo 10. 
(4) p\ fp+i if p is congruent to +3 modulo 10. 


Proof If p = 2 then f3; = 2 and hence p| f,+1. If p = 3 then fy = 3 and p| f,41. If 
p =5 then fs = 5 and p|f,. Now let p > 7. By Binet’s formula 


1 1+V5,, 1 1-V5,, 
In = Fel 5 ) Wek A Yona 


and by the binomial expansion 


(4/5) =14 (7)v5+ (3)s + (S)v5y dolls 
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If n is odd then 


n—-1 2. ni _ ny _ n nN\ ~9 oi wt 
2 fn = saa + V5) (1 ¥5y") =n+(3)54 (5)s*+ £53, 


Now let n = p be prime. Since p|(?) if 1 < i < p we must have 


p-1 


fp =5 7 mod p and hence 


i = 1 mod p 
by Fermat’s theorem. Since 


i —; Fp-if p41 = (-1)?"! =1 


we get 
0= i; =-1= Spi f p41 mod D- 


Therefore p| f,+1 or p| fp—1 since (fp—1, fo+i) = Sip-i,p+1) = f2 = 1. More con- 
cretely, we can use the above identities to show that 

P\fp-1 if p is congruent to +1 modulo 10 and 

P\ fp+1 if p is congruent to +3 modulo 10 (see exercises). 


Corollary 3.1.3. Let p be a prime greater than 7. Then each prime divisor of fp is 
greater than p. 


Proof Let q be a prime divisor of f, with p > 7a prime. Assume q < p. If q = p 
then g = p = 5 and hence we may assume that g < p. We then have 


(fp fa) = foo.) = fi = 1, 
(fp, fo-D) = fipg-) = fi = 1, 
(fp» fat) = fooaty = fi = 1. 
Then from Lemma 3.1.10, either g| f, or q| fg—1 or q| fg+1. This gives a contradiction 


because g|f, and q| f, implies that g| f; = 1 and g|f, and q|f,41 or g|fg—1 also 
implies that g|1. Therefore we must have that g > p. 


Based on the Fibonacci numbers, we can now give two more proofs of the fact 
that there are infinitely many primes. 


Proof One: Let M = {pj,..., Pn} be a finite set of distinct prime numbers and 
suppose that p) < pz <--- < p, with p, = 7. Let p be a prime divisor of f,,. 


Then from Corollary 3.1.3 we must have p > p, and hence p ¢ M. 
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Proof Two: Suppose {p1,..., Pn} with p; = 2 are all the prime numbers. We have 
fp, > \fori = 2,...,n. Then at most one of the f,, fori = 2,...,n has two prime 
divisors for otherwise since (fp,, fp;) = f(p;.p;) = 1 for i # j we would already 
have n + | primes. This contradicts for example that 


fig = (37)(113) and fs; = (557)(2417). 


We note that many of the ideas concerning the Fibonacci numbers can be greatly 
generalized. For example suppose K is an arbitrary field and x, y ¢ K. Then we 
define 

To(x, y) = 0, Ti (x, y) = 1 and then 


T(x, y) = XT,-1(%, y) — yTh-2(%, y) 


This sequence in K will satisfy many of the same properties as the Fibonacci 
numbers. If A is a 2 x 2 invertible matrix over K with tr(A) = x and det(A) = y 
then 

A° =T,,(x, yA + yTh_-1(x, y)I 


where J is the identity matrix. In particular 
Tn(x, yY? — Tr YTna@, y) = yn > I 
Ifx = | and y = —1 then 7, (x, y) = f, forn > 1. 
These generalized Fibonacci numbers are also related to the Chebyshev polyno- 
mials which play a role in the general approximation of functions. If y = 1 andn > 1 


then 
T, (x, 1) = Sp(x) 


where S,,(x) is nth Chebyshev polynomial of the second kind. We have 
Sntm(X) = Sn(x)Sm4i1(%) — Sin (X)Sn—1 0%) 


and 
Sam (x) = Sn (Sn41(X) _ Sn—1(X)) 7 Sn (x) 


for all natural numbers n,m. As polynomials in x these Chebyshev polynomials 
satisfy 
Som,n) (x) = (Sn (x), Sin (x)). 


For positive real values these Chebyshev polynomials have a particularly simple 
form. If K = R and x > 0 then let x = 2cos@ < 2. Then 
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sin(n@) 
Sn (x) Ss . 
sin(6) 
If x =2coshé@ > 2 then 
2 sinh(n@) 
Sr (x) = ———— 
sinh(@) 
while if x = 2 then 
Sn (x) =n. 


3.1.5 Some Simple Cases of Dirichlet’s Theorem 


Recall that Dirichlet’s Theorem, which we will state and prove formally in Section 3.3, 
says that if a, b are positive integers with (a, b) = 1 then there are infinitely many 
primes of the form an + b. In this section we prove certain special cases of this result 
which can be handled by elementary methods. Most of these proofs depend on the 
following easy idea. Suppose x € N has the prime factorization 


—. 01 ek 
X= Pp; se De 


Then if each p; = 1 mod m then x = 1 mod m. This fact is direct from the multi- 
plicative property of congruences. 
We first handle the case modulo 4. 


Lemma 3.1.11 There exist infinitely many primes of the form 4n + 3 and infinitely 
many of the form 4n + 1. 


Proof Suppose there are only finitely many primes of the form 4n + 3, say pj,..., 
Pk, With p,; the largest. Let q1,..., gq, be all the primes of the form 4 + | less than 
Px. Let 

aA Toppa sg = 1, 


Then x = —1 =3 mod 4 and hence x must be divisible by a prime p = 3 mod 4. 
But then p|(4-3-7--- peqi-+-q:) So p cannot divide x and thus a contradiction. 
Therefore there are infinitely many primes of the form 4n + 3. 

To handle the case 4n + 1 we must recall some facts about quadratic residues. 
From Section 2.6 it follows that if p is a prime greater than or equal to 3 then 


P 


-1 
iip=Cls: 

Hence —1 is a quadratic residue mod p only if p = | mod 4. Equivalently if x is 
any positive integer then if p|(x? + 1) it follows that p = 1 mod 4. Now suppose 
that there are only finitely many primes of the form 4n+ 1 say qi,..., qx. Let 
X = q-+-q and let p be a prime divisor of x7 + 1. Then p = 1 mod 4. But p|x 
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so p|x? and hence p cannot divide x” + 1. Therefore a contradiction and there must 
exist infinitely many primes of the form 4n + 1. O 


Essentially the same methods handle the situation modulo 8. 


Lemma 3.1.12 There exist infinitely many primes of each of the forms 8n + 1, 8n + 
3,8n +5 and 8n +7. 


Proof From the fact that (2/p) = Gp if p > 3 is prime (see Section2.6) we 
can obtain the following results whose proofs we leave to the exercises. If x is any 
positive integer and p > 3 is a prime then 

(1) If p|(x* + 1) then p = 1 mod 8. 

(2) If p\ (x? — 2) then either p = 1 mod 8 or p = 7 mod 8. 

(3) If p\ (x? + 2) then either p = | mod 8 or p = 3 mod 8. 


Now suppose that there exists only finitely many primes of the form 87 + 1, say 
Pis-++5 Pk, and let x = p; --- pg. Let p be a prime divisor of x* + 1. Then from 
above p = 1 mod 8, but p is not one of pj,..., pg, and hence a contradiction. 
Therefore there exist infinitely many primes of the form 8n + 1. 

Suppose next that there exists only finitely many primes of the form 8n + 7. As 
before call them p;,..., px and let x = p;--- py. Now each p; = —1 mod 8 and 
so x = +1 mod 8 and so x* = 1 mod 8. Let p be a prime divisor of x? — 2. It must 
be congruent to either 1 or 7 modulo 8. If each prime divisor of x7 — 2 is congruent 
to 1 mod 8 then x? — 2 is also congruent to 1 modulo 8. However x” is congruent 
to 1 modulo 8 and so x? — 2 is not congruent to 1 modulo 8. Therefore there must 
exist a prime divisor p of x? — 2 congruent to 7 modulo 8. This p cannot be one of 
P1,--+, Dx and hence a contradiction. 

The case of the form 87 + 3 is handled in an analogous manner (see exercises). 

To handle the case 8n + 5 we first show the following. 


Lemma 3.1.13 Let a, b be nonzero integers with (a, b) = 1. Then each odd prime 
divisor of a” + b? is of the form 4n + 1. 


Proof Let p be an odd prime divisor of a* + b*. Then there exists an n with 
n=-1+ kp 


for some k € Z. Hence —1 is a quadratic residue mod p and therefore p = 1 mod 4. 
Now let p be the largest prime of the form 87 + 5 and let 


x= 3757... p? +4 


where 3,5,..., p are all the primes up to p and p > 7. From Lemma3.1.12 any 
prime divisor of x is congruent to | modulo 4 so then congruent to either 1 modulo 8 
or 5 modulo 8. Since (2m + 1)? +4 = 4m(m + 1) + 5 it follows that x is congruent 
to 5 modulo 8. Therefore x must have a prime divisor of the form 87 + 5 which is 
larger then p. 
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A slight modification and the use of quadratic reciprocity allow us to handle 
primes modulo 3. 


Lemma 3.1.14 There exist infinitely many primes of the form 3n + | and infinitely 
many of the form 3n + 2. 


Proof Thecase 3n + 2 is handled directly. Suppose that p;, ..., px are all the primes 
congruent to 2 modulo 3 and let x = pj po--- py. If x = 1 mod 3 thenx+1=2 
mod 3. Hence there must be a prime congruent to 2 mod 3 dividing x + 1. But as 
before p|(p1--- px) SO p cannot divide x + 1. 

Ifx = 2 mod 3, thenx + 3 = 2 mod3. Thenas before there must be a prime p = 2 
mod 3 dividing x + 3. But p|x so p cannot divide x + 3. These two contradictions 
then imply that there are infinitely many primes of the form 3n + 2. 

To handle 37 + 1 we must use quadratic reciprocity. Consider for an odd prime p 


(—3/p) = (-1/p)(3/p). 


Now (—1/p) = (—1)*= and (3/p) = (—1)= (p/3) by quadratic reciprocity. There- 
fore 


P 
2 


p-l 


7 (p/3) = (p/3). 


(-3/p) = (-1)? (-1) 
Directly then 
(p/3) = Lif p = | mod3 


(p/3) = -lif p = —1 mod 3. 


Therefore —3 is a quadratic residue mod p only if p = | mod 3. Equivalently for 
any integer x any odd prime divisor of x + 3 must be congruent to 1 mod 3. 

Now suppose that there are only finitely many primes of the form 3n + 1 say 
Pi.+++, Py. Let x = 2p, --+ py and let p be a prime divisor of x? + 3. Then p = 1 
mod 3 but as before p cannot be one of the p;. Hence there are infinitely many of 
the form 3n + 1. 


The methods used in the preceding lemmas can handle many other special situa- 
tions of Dirichlet’s Theorem, for example 6n + 5. However they cannot be extended 
to the whole result. We close this section with one general result which can be proved 
with the same kinds of elementary methods. The proof of this result is taken from 
[NZ] which was a modification of a result in [NP]. 


Theorem 3.1.12 Let m be a positive integer. Then there exist infinitely many primes 
of the form mn + 1. 


Proof The theorem is actually a consequence of the next lemma which is interesting 
in its own right. 


Lemma 3.1.15 Given a positive integer m then there exists a prime divisor of m™ — 
1 which is congruent to 1 modulo m. 
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Proof (Lemma3.1.15) Suppose that given m > 0 there is no prime p = | mod m 
such that p|(m™” — 1). For any prime factor g of m™ — 1 let h be the order of m 
modulo gq that is h is the smallest positive integer such that m” = 1 mod q. Since 
the nonzero elements in Z, form a multiplicative group it follows that h|g — 1 and 
h|m (see Chapter 2). If h = m then m|(q — 1) and g = 1 mod m contrary to the 
assumption above. Therefore h 4 m and m = hc with c > 1. This holds, under the 
assumption, for possibly different and c for any prime divisor of m™” — 1. 
Suppose q’ is the highest power of g dividing m” — 1. Then 


MW == te = hin? 4a 4 som + 1), 
Since m' = 1 mod q we have 
gE QO oe oy eS 14 et 1 Se mod g, 


But q is a divisor of m™” — 1 so q is not a divisor of m or c and hence not of 
mr—h 4 mch-2h 4... 4+ mt + 1. Therefore q’ is also the highest power of g dividing 
m" — 1, Further the same argument shows that if s|m then q’ is also the highest power 
of qg dividing m* — 1. 

Given a prime divisor g or m let h, c be defined as above and then let the distinct 
prime divisors of c and m be 


Pi,--+, Pe and py, ..., Dks Pktis-++> Pn Tespectively 


with 1 < k <n. Then h is not a divisor of any of the integers 


m m m 


Pk+1 = Pk+2 Pn 


Consider the integers of the form 


m 
Pi, Pix ** Pi, 


where | <i, < in <... < i,. Let T be the set of integers of this form with t odd and 
U the set with t even. Define 


[ser(m’ — )) 
Teun — D 


We show that Q = m™ — | and then show that this is impossible leading to a con- 
tradiction and hence there must be a prime divisor congruent to 1 mod m. 

To show first that Q = m’” — | we show that the prime power factors are the same. 
Each exponent s appearing in Q divides m and hence we need only consider prime 
factors of m™ — 1. If for a prime divisor g of m™” — 1 the corresponding i, > k then 


Q= 
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h does not divide s. On the other hand if i, < k then the highest power of q dividing 
m* — 1 is q’ also as shown above. Therefore qg is a divisor of any term m* — 1 in 
Q if and only if h|s and this is true if and only if i, < k. The number of factors of 
m* — | in the numerator of Q having i, < k is 


O+Q+Qe- oss 


Similarly the number of factors of m* — 1 in the denominator of Q having i; < k is 


QQ as 


If we subtract (3.1.5.1) from (3.1.5.2) we get the binomial expansion of 1 — (1 — 1)‘ 
which clearly has value |. It follows that Q must be an integer and the highest power 
of q dividing Q is q’. Since this holds for every prime divisor g of m”™” — 1 it must 
be the case that Q =m” — 1. 

We now show that this is impossible. Rewriting Q as m™” — | we get 


(m™ — 1) [[@n' - 1) =] Jon - b. 


seU seT 


Let b be the smallest integer of the form ae and consider the above equation 
iy Pin Pit 
1 


modulo m?+!, Every factor m* — 1 is congruent to —1 modulo m?+ 
Therefore the above equation reduces to 


except m? — 1. 


+(m? — 1) =+1 mod m+), 
This then implies that 
m? =0 mod m+! or m? = —2 mod m?*!. 


Both of these congruences are impossible since b is positive and m > 2. This con- 
tradiction establishes Lemma3.1.15. 


We now prove Theorem 3.1.12. 


Proof We want to show that given an m there are infinitely many primes of the 
form mn + 1. From Lemma3.1.15 we know that in any progression of the form 
1+m,1+2m,... there is a prime that is a divisor of m’” — 1. Since this holds for 
any m it follows that in any arithmetic progression 1+ M,1+2M,... there must 
be a prime. Suppose then that for some m there are only finitely many primes of 
the form mn + | and let P be the product of these primes. From the observation 
above with M = mP there is a prime gq in the arithmetic progression | + mP, 1+ 
2mP,...,1+nmP,....This prime is congruent to 1 modulo m but is not a divisor 


3.1 The Infinitude of Primes 89 


of the product P. Therefore a contradiction and hence there must be infinitely many 
primes of the form nm + 1. O 


We note that the proof can be modified to also show that there infinitely many 
primes of the form nm — 1. 


3.1.6 A Topological Proof and a Proof Using Codes 


We close this section on elementary proofs of the infinitude of primes by presenting 
several more; one topological, one using codes and two more elementary analytic 
proofs. 

We first look at the topological proof which is due to H. Furstenberg [Fu]. 


Proof Using Topology 

We introduce a topology on the integers Z. As a basis for the topology we take all 
arithmetic progressions from —oo to oo. Each arithmetic progression is then open 
but also closed since its complement is a union of these arithmetic progressions. 
Hence each finite union of arithmetic progressions is closed. 

Now let A, be those arithmetic progressions consisting of multiples of a prime 
Pp, that is 

Ap = {... —np,...,—p,9, p,...,np,...} forn EN. 


Now let A = U,A, where this union is taken over all primes p. The complement of 
A is {—1, 1}. Since {—1, 1} is not open A is not closed. Hence A cannot be a finite 
union of closed sets. Therefore the number of primes must be infinite. 


A variation of this was given by S. Golomb [Go]. As a basis for the topology 
take all arithmetic progressions an + b from —oo to oo. Let A,» be those arithmetic 
progressions consisting of multiples of np where n is a positive integer and p is a 
prime. The progression {np} with p a prime is closed and X = U, Any is not closed. 
Then in the same manner as above the number of primes must be infinite. 


We next give a proof using codes which is due to I. Stewart. We first need the 
following theorem. 


Theorem 3.1.13 /f we have a finite set of 2% elements and map it bijectively onto a 
set of binary strings then at least one string has length> N. 


Proof There are only 2% — 1 binary strings of length < N, the empty string, two of 
length 1, four of length 2, ...,2"—! of length N — 1. 


Now we can give our proof using codes. 


Proof Using Codes 
Assume that the set of primes is finite say {p;,..., p,}. We introduce a code via 
strings for each natural number together with zero. For 0 we choose the symbol 0. For 
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each natural number 7 we write it as a product of primes and for each prime divisor 
we write down the multiplicity in the product. For the listing of these multiplicities we 
use brackets to start and end a listing. Suppose = 5 then the primes are 2, 3,5, 7, 11. 
Then we get the following codes for the first few natural numbers 


0<0 

00000] 
[00000]0000] 
0[00000]000}] 
[[00000]0000]0000] 
00[00000]00}] 
[00000][00000]000] 


le 
2<° 
3a 
4< 


5< 


(ee pee eB eee ee 


6< 


To analyze these codes we shorten each representation by canceling the closing 
brackets and take | for the starting bracket. Hence 


0<0 

1 + 100000 

2 <> 11000000000 

3 < 10100000000 

4 < 1110000000000000 
5 < 100100000100 

6 < 1100000100000000 


We next need the following lemma. 


Lemma 3.1.16 Assume that the first N nonnegative integers are coded all by strings 
of length less than t. Then the first 2“ nonnegative integers are coded by strings of 
length less than rt. 


Proof In their prime factorization, the first 2% natural numbers have less than N 
times the factor 2. Analogously all r multiplicities in the decomposition are less than 
N. By assumption all the prime numbers p),..., p, have codes of length less then 
t giving the result. 


We now show that r finite leads to a contradiction. If N = 0 then we can choose 
t = 2 since the length of the string 0 is 1 whichis less than 2. Using the above Lemma 


we obtain by induction that the first 2” , the power being taken ¢ times, natural 
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numbers are coded all with strings less than 2(r'). Choose t = fo large enough so 
that 


2 re 
log,(27,_) = 27, > 2r”, 


taken (t9—1)times 


It follows that for 


No = 2 


taken (to —1)times 


the first 2% natural numbers can be coded by strings with length less than No. This 
contradicts Theorem 3.1.13 showing that there must be infinitely many primes. 

The next proof is analytic and uses Stirling’s approximation along with a formula 
due to Legendre. This proof appears in the book by Apostol [A]. 


Proof Using Stirling’s Approximation 
Stirling’s approximation for n! is given by (see [A]) 


n! & (“)"./2nn for large n. 
e 


It follows then that ; 
lim (n!)= = oo. 
noo 


For n > 1 we have 


n= I] po) 


psn 


where p runs over all the primes less than n. From a formula of Legendre (see [A]) 


Now (see Cohen [C]) 


am= asa 


k>0,[ 2p ]sn k=1 P 


It follows that 


1 apin!) it, 
(n!)7 = p* < | | pe. 


psn psn 


If the number of primes is finite it follows from the above that (nl)i stays finite in 
the limit as n — oo contradicting the Stirling approximation. 


Proof Another Analytic Proof 
This appears in the book of P. Ribenhoim [Ri]. Assume that there are only finitely 
many prime numbers 
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Pi < Pr <+++ < Dy. 
Suppose t € N and let N = p}. Each m < N in N can be written as 
m = py'p5?--: pe” with a; > 0 


and the sequence (a1, ..., @,) is unique. We then have 


Qi; 


P}i SmsN=p,. 


Let E = na Then a; < tE. 
On the other hand N is at most equal to the number of sequences (aj, ..., Q). 
Hence 


pPL=N<(tEH+1) <'(E+1)". 


This gives a contradiction for ¢ sufficiently large showing that there must be infinitely 
many primes. 


3.2 Sums of Squares 


As we described in our historical overview much of the outline of the formal study of 
number theory was laid out in Gauss’ work Disquisitiones Arithmeticae. He rested 
the study of number theory on three pillars—the theory of congruences, which we 
discussed in Chapter2, the theory of algebraic integers which we will discuss in 
Chapter6 and the theory of forms. In particular relative to this last topic Gauss 
considered the question of when an integer n can be represented by a quadratic 
form in other integers. 
Here an (integral) quadratic form in 7 variables is a polynomial 


n 


F Qis-c2% Xp) = > Qj jXjX j + >) dix; +c 


i,j=l 


where each aj;;, bj and c are integers. A form is a positive form if the substitution of 
any integers other than (0, 0, ..., 0) leads to a positive value. It is a negative form if 
the substitution of any integers other than (0, 0,..., 0) leads to a negative value. It 
is a definite form if it is either positive or negative. For example f(x, y) = x? + y? 
is a positive definite form. 

In particular in two variables a quadratic from has the representation 


f(x, y) = ax? + bxy +cy’ 
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where a, b,c are integers. The following lemma describes when such forms are 
positive definite. 


Lemma 3.2.1 The quadratic form f(x, y) = ax? + bxy + cy? is positive definite 
if and only if the discriminant b? — 4ac is negative anda > 0,c > 0. 


Proof Suppose first that f(x, y) is positive definite. Then f(1,0) = a> 0 and 
f(O, 1) =c > 0. To show that the discriminant must be negative notice that f(x, y) 
may be rewritten as 


1 2 2.2 
f(x,y) = 4, (ee + by)~ + (4ac — b*)y*). 


Using this rewritten from we see that f(—b, 2a) = (4ac — b?)a. Since this must 
be positive and a > 0 it follows that (4ac — b?) > O and hence the discriminant is 
negative. 

Conversely suppose that the discriminant is negative and a > 0, c > 0. From 
the rewritten form for f(x, y) above it is clear that f(x, y) > 0 for all integral 
pairs (x, y). If f(x, y) = O it follows that 2ax + by = Oand (4ac — b”) y* = 0 from 
which one easily obtains that x = y = 0. Therefore f(x, y) is positive. 


A quadratic form f(x;,...,X,) represents an integer m if there exists integers 
(b,,...,b,) such that f(b,,...,b,) =m. 

In this section we will look at the quadratic form question. Specifically we will 
consider the question of when an integer is represented as a sum of squares. 


3.2.1 Pythagorean Triples 


The oldest occurrence of sum of squares questions arises from integral solutions 
of the Pythagorean Theorem. Recall that a right triangle can have integral sides, 
for example (3, 4,5) or (5, 12, 13). The question naturally arises as to finding, if 
possible, all such integer right triangles. 


Definition 3.2.1 A pythagorean triple is a triple (a, b, c) of integers witha” + b? = 
c?. We consider c fixed and consider the triple (a,b, c) equivalent to the triple 
(b, a, c). A pythagorean triple (a, b, c) is called primitive if (a, b, c) are coprime. 


Now if a? + b? = c? then (da)? + (db)? = (dc)? for any integer d. Clearly then 
for the classification of pythagorean triples it is enough to consider primitive triples. 
The following theorem which in essence appeared in Diophantus’ book Arithmetica 
written about 250 A.D. gives a complete classification of primitive pythagorean 
triples. 


Theorem 3.2.1 Jf n and m are two relatively prime integers with n —m > 0 and 
n —m odd then (2mn, n* — m?, n* + m?) is a primitive pythagorean triple. Further 
any primitive pythagorean triple can be obtained in this way. 
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Proof Straightforward calculations show that if a = 2nm, b = n? — m? and c = 
n>? + m* with (n,m) = 1 andn —m = 2k + 1 > O then (a, b, c) forms a primitive 
pythagorean triple (see the exercises). 

Conversely we must show that any primitive pythagorean triple is obtained in this 
manner. Let (a, b, c) be a primitive pythagorean triple. Since (a, b, c) are coprime 
and a* + b* = c? it is easy to see that these integers must also be pairwise coprime. 
Hence no two can be even. Further suppose that both a and b are odd so that a = 
2m+1,b=2n+ 1. Then 


CHa +h = Qm4 1) + Qn+ 1)? = 22m? + 2n? +2m + 2n+ 1). 


Then c? is even but c? is not divisible by 4, which is impossible. Hence a and b 
cannot both be odd. It follows that in (a, b, c) one of (a, b) must be even, the other 
odd and then c is odd. 
Now suppose a is even and b and c are both odd. Then c + b and c — b are both 
even. Let 
c+b=2uandc—b=2v. 


This implies directly that 
b=u-—vandc=u+uv. 
Further (u, v) = 1 for otherwise (b, c) € 1. We now have 
a=c—b? =(c+b\(c —b) = 4uv. 


Since a is even a = 2w which implies from the above that w? = wv and hence uv 
is a perfect square. Since (u, v) = | it is then an easy consequence of the Funda- 
mental Theorem of Arithmetic that both u and v must also be perfect squares (see 
Exercise 2.31). Hence u = n?, v = m. Therefore we have 


a =2mn,b =n? —m’?,c =n +m’. 


Thus (a, b, c) has the required from and we must show that n, m have the required 
properties. 

Since (u, v) = | it follows that (m,n) = 1. Since b > 0 it follows that u > v 
which implies that n* > m? which gives n > m since both are positive. m and n 
cannot both be even and from the same argument as before they cannot both be odd. 
Therefore n — m is odd completing the proof. 


There are many other questions concerning pythagorean triples that have been 
considered. For example we may ask when the (3, 4, 5) or (5, 12, 13) situation arises, 
that is, when does the hypotenuse differ from one of the legs by | or some fixed 
number d (see the exercises). Further as a corollary of the classification we get the 
following which is a special case of Fermat’s Big Theorem and illustrates what has 
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been called Fermat’s method of descent. It is believed that Fermat’s supposed proof 
of the big theorem was based on this technique. 


Corollary 3.2.1 The equation x* + y* = z? has no solutions in natural numbers. 
In particular the equation x* + y+ = 24 has no solutions in natural numbers. 


Proof Assume that there is a solution to xt y4 = 2’ fornatural numbers (xo, yo, Zo). 
We then construct a further solution (x1, y;, Z1) with z} < zo. As in the classification 
theorem we may assume that xo, yo, Zo are coprime and then Ge. Ve. Zo) 18 a primi- 
tive pythagorean triple. As in the proof of the classification one of (xo, yo) must be 
even the other odd and zg is then odd. Suppose then that yo is even. Then from the 
classification theorem there exist natural numbers a, b with (a, b) = 1 and 


xe =a- b?, ye = 2ab,z = a+b’. 
a cannot be even because then b would be odd and it would follow that x5 = 3 mod 
4. Hence a is odd and b is even and tn +b? =a’. This implies that (xo, b, a) is 
a primitive pythagorean triple with b even. It follows again from the classification 


theorem that 
x0 = —d’,b=2cd,a =C+da 


for coprime positive integers c,d with c > d andc + d odd. 
Since (a, b) = 1 we obtain that c,d and c? + d? are pairwise coprime, that is 


(c,d) = (c,c? +d’) = (d,c? +d’) = 1. 


From 1 
(50) = cd(c* +d’) 


we get a pairwise coprime triple (x1, y1, Z1) with 
xe =c,ye=d,2 =c +d’. 


This in turn implies that 
e4+C=a, +7, =2 


and hence this triple gives another solution to the original equation. From 
ZI <g=C4+d@ =a <a+b? =2 


it follows that z; < zo. Therefore if we assume that there is a solution (xo, yo, Zo) 
€ N? of the equation x++ y+ = z* then we can construct an infinite sequence 
(Xx, Ve, Zk), kK = 0,1, 2,... of solutions with zo > z} > Z2 > --- > 0. However by 
the well ordering of the natural numbers there must a minimal element and hence 
this is impossible and therefore a contradiction. 
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3.2.2 Fermat’s Two-Square Theorem 


We have completely classified pythagorean triples (a, b, c) with c? = a? + b*. We 
now consider the question of when an integer n, not necessarily a square, can be 
written as a sum of squares. That is, given n, when is n = a’ +b? for integers a, b. 
In the language of forms we are asking when an integer n can be represented by the 
quadratic form f(x, y) = x? + y?. The basic result is the following, generally called 
Fermat’s Two-Square Theorem. 


Theorem 3.2.2 (Fermat's Two-Square Theorem) Let n > 0 be a natural number. 
Then n = a? + b? with (a, b) = 1 if and only if —1 is a quadratic residue modulo n. 


In this section we lay out a purely number theoretic proof of this theorem. In 
the course of developing this proof we will give several equivalent formulations 
of the theorem. In the next section we give a separate proof using the structure of 
the Modular Group M = PSL2(Z) (see the next section for an explanation). This 
second proof is interesting since it is in some sense independent of number theory. 

We first consider the case of primes. 


Lemma 3.2.2. —1 is a quadratic residue modulo a prime p if and only if p = 2 or 
p =1mod 4. 


Proof If p = 2 then —1 = 1 = 1? mod 2 and so —1 is a quadratic residue mod 2. 
Consider p now to be an odd prime. By Wilson’s Theorem (Theorem 2.4.5) we have 


1. pad 
(p —1)!=—1 mod p > (1-2...25 FS ; 


--(p —1)) =—1 mod p. 


Now each number in the product ae -++(p — 1)) is the negative modulo p of a 


number in the product (1 -2--- Po). For example modulo p, —1 = p—1,-2= 
p — 2 and so on. Therefore we can rewrite Wilson’s Theorem as 


(1-2---F@). -(7 )+++(=1) = -1 mod p. 


But this implies 


p= —1 
r(1- ... P7*)2 = _1 mod p. 


Let x =1-2--- pe mod p. If p = 1 mod 4 then i is even and (—1)F =1. 
Hence 


x =—1 mod p 


and —1 is a quadratic residue mod p. 
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Conversely suppose x? = —1 mod p has a solution x9. Then 


x2 = —1 mod p =55 5 = (-1)°= mod p. 


pol 
2 


But xo? > =x) = 1 mod p by Fermat’s theorem. It follows that (— 1)*> = 1mod 
p. Since p is an odd prime, it follows that —1 is not congruent to | mod p so the 
above implies that a= is even and p = | mod 4 completing the proof. 


We now tie this result to sums of squares. 
Lemma 3.2.3 /f p = 1 mod 4 then p = a? + b? with (a, b) = 1. 


Proof Note first that if p = a? + b? then a, b must be relatively prime for otherwise 
a common divisor of a and b would divide p. 

Now suppose p = | mod 4. Then from the previous Lemma —1 is a quadratic 
residue mod p. Let xo then be a solution to x? = —1 mod p. 

Let K = [,/p] be the greatest integer less than or equal to ./p. Clearly then 


K</p<K+1 = K’<p<(K+1)’. 
Consider the set of integers 
= {u+xu;0<u< K,0<v < K}. 

There are K + 1 choices for each of u and v and hence S$ has (K + 1)? elements. 
Since p < (K + 1)? and there are only p residue classes mod p we must have two 
distinct elements of S which are congruent modulo p. Hence there exists u1, v1, U2, V2 
with 

uy + Xv, = U2 + Xov2 mod p. 
Now if uw; = uz we have xov; = xov2 mod p. But xp is a unit mod p so then vj = v2 
mod p. Since both v1, v2 are less than p it follows that vj = v2. Similarly if v; = v2 
it follows that uw; = uz. Since u; + xov, is distinct from uz + xov2 it follows that 
uy A uz and vy F¥ v2. 

From the congruence we may rewrite as 
Uy — U2 = Xo(v2 — vj) mod p. 


Let ad = u; — U2, b = v2 — vj. Thena £0, b £0 anda = xob mod p. Therefore 


a = 500. = @=-b? = a’+b?=0mod p. 


Hence p\(a? + b?). We show that p= a* + b?. Since 0 < u, < K and0 <u. < K 
it follows that —K <u, — uo < K. Then (uw, —u2)* =a? < K? < p. Hence ae< 
p. Analagously b? < p. Therefore 0 < a? + b? < 2p. However the only multiple of 
p within the range 0 to 2p is p itself. Therefore p = a* + b?. 
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Lemma 3.2.4 Suppose n = a? + b? and q is a prime divisor of n. If q = 3 mod 4 
then q?|n. 


Proof Suppose q|(a? + b?) with q a prime congruent to 3 mod 4. If g { a then a is 
a unit mod q. Then 


a+b =0 = bP =-a’ = (ba!) =-1 modg. 


Hence —1 is a quadratic residue mod g contradicting g = 3 mod 4. Hence g|a. 
Similarly g|b. But then q?|(a* + b*) and then q?|n. 


Theorem 3.2.3 Suppose n > 2 has the prime decomposition 


_ 40 ,/1 Pe V1 a2 
n=2 P; ae Dy q aa’ 


where pj = 1 mod 4 fori =1,...,k and qj =3 mod 4 for j =1,...,t. Thenn 
can be expressed as the sum of two squares if and only if all the exponents 7; of the 
primes congruent to 3 mod 4 are even. 


We note that this theorem is also called Fermat’s Two-Square Theorem. 


Proof Notice first that for integers a, b, c, d we have 
(a + b’)(c? + d”) = (ac — bd)? + (be +. ad)’. 


Therefore if m = uv and u is a sum of two squares and v is a sum of two squares 
then m is also a sum of two squares. 

Now 2 = 1+1= 17 +1? so any power of 2 is a sum of two squares. Similarly 
if p = 1 mod 4 then from Lemma3.2.3 p is the sum of two squares and hence any 
power of p is the sum of two squares. If y = 2k is even and g = 3 mod 4 then 
q? = q** = (q*)? + 0 and q? is a sum of two squares. Putting these all together 
we have that if each exponent of a prime congruent to 3 mod 4 is even in the prime 
decomposition of n then n is the sum of two squares. 

Conversely if n = a? + b? and q|n with g = 3 mod 4 then from Lemma3.2.4 
q’|n and thus the exponent of g in n must be even. 


We now prove Theorem 3.2.2. 


Proof (Theorem3.2.2) Suppose n = a’ +b? with (a, b) = 1. Then (n, b) = 1 for 
otherwise a common divisor of n and b would divide a. Hence b is a unit mod n and 
so b~! exists mod n. Then 


n=@t+h = a@4+bh=0 55 (ab~')* = —1 modn. 
Therefore —1 is a quadratic residue mod n. 


Conversely suppose —1 is a quadratic residue mod n. We show that n = a? + b? 
with (a,b) = 1 by using a modification of the proof of Lemma3.2.3. Let xo be 
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a solution of x? = —1 mod n. Then there exist integers (y, b) = 1 withO <b < Jn 


such that 
xo y 1 


(see exercises). Now let 
a=xob+ny. 


Then a = xb mod n and hence a? + b? = 0 mod n. Now |a| < /n so 
0<a+bh* <2n 


and as in the proof of Lemma 3.2.3 the only multiple of n in this range is n itself and 
therefore n = a? + b?. Further (a, b) = 1. To see this notice that we have 


n= (xpb + ny) +h = d+ xb” + 2xonby + ny’. 
It follows that 


1+x5 5 2 
l= = + xoby + xoby +ny° = ub+ y(xob + ny) = ub+ ya. 


Theorem 3.2.2 gives a criteria given n to determine if n is representable as a sum 
of two squares. A representation n = a? + b? with (a, b) = 1 is called a primitive 
representation. Combining the two forms for Fermat’s Two Square Theorem we get 
the following corollary. 


Corollary 3.2.2. An integer n has a primitive representation as a sum of two squares 
Q) 


if and only ifn = 2° p, sone where € = Oore = 1, each a; € Nand each p; = | 
mod 4. 


Proof From Fermat’s Two Square Theorem n has a primitive representation if and 
only if —1 is a quadratic residue mod n. Then —1 must be a quadratic residue mod p 
for any prime divisor of n. Therefore any odd prime divisor of n must be congruent 
to 1 mod 4. Further —1 is not a quadratic residue mod 2° if a > 1. Therefore the 
highest power of 2 which can divide n is 1. 


Theorems 3.2.2 and 3.2.3 characterize those integers n for which there is a repre- 
sentation as a sum of two squares. The question can then be asked how many different 
representations can there be? If we let 


r(n) = the number of pairs (a, b) € Z? withn = a? +b 


then the following can be proved (see [Za] or [NZ].) We leave the proof as an exercise 
(see Exercise 3.35). 
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Theorem 3.2.4 Let r(n) be defined as above. Then 
(1) r(n) = 4 Yiain XC) where 


x(d) = lifn = 1 mod 4, 
x(d) = -1ifn = —1 mod 4, 
x(d) = 0 ifn = 0 mod 2. 


(2) ye 7 “= = 4C(s)L(s) where 


C(s) = 


and 


L(s) = = with Re(s) > 1. 


(3) ¢r(mn) = $r(n)ir(m) if (a, m) = 1. 
If p = 1 mod 4 is a prime then 


r(p) =4 >) x(d) = 4x) + x(p)) = 8. 


d|p 
For p = 3 mod 4 then r(p) = 0. For example for p = 5 the 8 pairs are 


(2, 1), (1, 2), (-1, 2), 2, -1), C, —2), (-2, D, (-1, —2), (2, -D. 


The function ¢(s) in the theorem is the Riemann zeta function which we introduced 
earlier and which will play a crucial role in the proof of the prime number theorem. 
The function y(”) is called a Dirichlet character and the function L(s) a Dirichlet 
series. These will play a role in the proof of Dirichlet’s theorem. 


3.2.3. The Modular Group 


If R is any ring with an identity, then the set of invertible n x n matrices with 
entries from R forms a group under matrix multiplication called the n-dimensional 
general linear group over R (see [Ro]). This group is denoted by GL, (R). Since 
det(A) det(B) = det(AB) for square matrices A, B it follows that the subset of 
GL,,(R) consisting of those matrices of determinant | forms a subgroup. This sub- 
group is called the special linear group over R and is denoted by SL,,(R). In this 
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section we concentrate on SL2(Z), or more specifically a quotient of it, PSL2(Z) 
and use properties of this group to give another, more direct, proof of Fermat’s 
Two-Square Theorem. 

The group SZ2(Z) then consists of 2 x 2 integral matrices of determinant one: 


sia) = {(2 fj) tsbve.d € Z,ad — be = 1}. 


SL (Z) is called the homogeneous modular group and an element of SL(Z) is 
called a unimodular matrix. 

If G is any group, its center, denoted by Z(G), consists of those elements of G 
which commute with all elements of G; 


Z(G) = {g € G; gh = hg, Vh € G}. 


It is easy to see that Z(G) is a normal subgroup of G (see exercises) and hence we 
can form the factor group G/Z(G). For G = SL2(Z) the only unimodular matrices 
that commute with all others are +7 = + ¢ i) Therefore Z(SL2(Z)) = {I, —J}. 
The quotient 

SL7(Z)/Z(SL2(Z)) = SL2(Z)/{I, —T} 


is denoted PSL>(Z) and is called the projective special linear group or inhomo- 
geneous modular group. More commonly PSL»2(Z) is just called the Modular 
Group and denoted by M. 

M arises in many different areas of mathematics including number theory, com- 
plex analysis and Riemann surface theory and the theory of automorphic forms and 
functions. M is perhaps the most widely studied single finitely presented group. 
Complete discussions of M and its structure can be found in the books Integral 
Matrices by M. Newman [New 1] or Algebraic Theory of the Bianchi Groups by 
B. Fine [F]. 

Since M = PSL2(Z) = SL2(Z)/{I, —I} it follows that each element of M can 
be considered as +A where A is a unimodular matrix. A projective unimodular 
matrix is then 


«(2 4) 14: Brevd € Za ad ~ be = 1. 
cd 

The elements of M can also be considered as linear fractional transformations over 
the complex numbers 


i 
J a eee a = beat. 
czt+d 


Thought of in this way, M forms a Fuchsian group which is a discrete group of 
isometries of the non-Euclidean hyperbolic plane. The book by Katok [K] gives 
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a solid and clear introduction to such groups. This material can also be found in 
condensed form in [FR]. 

We will shortly describe the abstract structure of the group M. First though we 
use it to give a direct proof of Fermat’s Two-Square Theorem. We need the following 
lemma. Recall that the trace of a matrix A is the sum of its diagonal elements. Trace 
is preserved under conjugation so that tr(A) = tr(T~!AT) for any square matrices 
A and invertible T. Recall also that in a group G two elements g, g; are conjugate if 
there exists an h € G such that h~!gh = g,. Conjugation is an equivalence relation 
on a group and the equivalence classes are called conjugacy classes. 


Lemma 3.2.5 Let A be a projective unimodular matrix with tr(A) = 0. Then 


A is conjugate within M to X=+ ia 0) That is there exists T € M with 
PONT =: A, 
Proof Let A = + (: - s) Let S be the set of conjugates of A within M so that 


S ={T"'AT; T € M}. 


Since conjugation preserves trace S consists of matrices of trace zero. Let 


r-a(2) 


be an element of S$ with |a| minimal. This exists from the well ordering of Z. We 
show that a must equal zero. 
Suppose a 4 0 then 


-—a@ —be=1 => -be =a? +1 => [dllcel =a +1. 


It follows then that b £4 0, c € 0 and either |b| < |a| or |c| < |a|. Assume first that 
|c| < |a|. We may assume that a > 0 and c > 0. Then 


0<a-c <a. 


Now conjugate Y by T=t (( ) Then T7! = + i ) and 


1 _ 1-1\ fa b 11\ |, fa-—c2a+b-—c 
. YT =+(6 gee | c c-a ) 


But then 0 < a — c < a contradicting the minimality of |a|. 
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If b < a assuming a > 0,b > 0 conjugate Y by T= + ( 


10 
+ (; 4) and 


Again 0 < a — b < a contradicting the minimality of |a|. 
Therefore in a minimal conjugate of A we must have a = 0 and hence —bc = 1. 
It follows that b = +1 and c also and therefore 


01 
paa(9 Nox 


Now consider conjugates of X within M. Let T = + (: a) Then 


=| _ a—b b 
- ee a) 


completing the proof. 


and 
= ab 01 d —b\ _ —(bd +ac) a* +b? 
rer nt (O(c) ae(a ra mtac) C2 


Therefore any conjugate of X must have form (3.2.1). 
We now reprove Fermat’s Two-Square Theorem. 


Theorem 3.2.5 (Fermat’s Two-Square Theorem) Let n > 0 be a natural number. 
Then n = a? + b? with (a, b) = 1 if and only if —1 is a quadratic residue modulo n. 


Proof Suppose —1 is a quadratic residue mod n. Then there exists an x with x7 = —1 
mod n or x? = —1 + mn. This implies that —x* — mn = | so that there must exist 
a projective unimodular matrix 


A=z(? ar 
m —x 


The trace of A is zero so by Lemma 3.2.5 A is conjugate within M to X and therefore 
A must have form (3.2.1). Therefore n = a? + b? since n > O. Further (a, b) = 1 
since in finding form (3.2.1) we had ad — bc = 1. 

Conversely suppose n = a” + b? with (a, b) = 1. Then there exists c, d € Z with 
ad — bc = | and hence there exists a projective unimodular matrix 


104 3 The Infinitude of Primes 


Then 


2 2 
1 ,faa@+t+b\_, fan 
_ -+(; a )-#(7 21). 


This then has determinant one so 


2 2 


— —ny=1 => oe =-1—-ny = o =-I1modn. 


Therefore —1 is a quadratic residue mod n. 


This type of group theoretical proof can be extended in several directions. 
Kern-Isberner and Rosenberger [KR 1] considered groups of matrices of the form 


U = (og yh) casbied.N © Zad Nbc =| 


or 
U = (08 og) aebicodsn eZ, Nad — be = 1. 
They then proved that if 


N € {1, 2,3, 4,5, 6, 8,9, 10, 12, 13, 16, 18, 22, 25, 28, 37, 58} 


andn € N with (n, N) = 1 then 

(1) If —N is a quadratic residue mod n and n is a quadratic residue mod N then 
n can be written as n = x? + Ny? with x, y € Z. 

(2) Conversely if n = x? + Ny* with x, y € Z and (x, y) =1 then —N isa 
quadratic residue mod n and n is a quadratic residue mod N. 

The proof of the above results depends on the class number of Q(/—N) 
(see [KR 1]). 

In another direction Fine [F 1] and [F 2] showed that the Fermat Two-Square 
Property is actually a property satisfied by many rings R. These are called sum of 
squares rings. For example if p = 3 mod 4 then Z,» for n > 1 is a sum of squares 
ring. 

We close this subsection by describing the group theoretical structure of both 
SL(Z) and M = PSL2(Z). This structure can be developed with only minimal 
number theory. 


Theorem 3.2.6 The group SL2(Z) is generated by the elements 
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Further a complete set of defining relations for the group in terms of these gen- 
erators is given by 
Sree ys =f 


In the language of combinatorial group theory we say that SL2(Z) has the pre- 
sentation 
oo 27 Sve xk 27s. 


Proof We first show that SL2(Z) is generated by X and Y, that is every matrix A in 
the group can be written as a product of powers of X and Y. 


Let 
11 
v=()) 
Then a direct multiplication shows that U = XY and we show that SL2(Z) is gen- 
erated by X and U which implies that it is also generated by X and Y. Further 


n_ {in 
w= (61) 
so that U has infinite order. 
Let A = E ‘) € SL>(Z). Then we have 


XA= —c-—d snd a at+kcb+kd 
a b c d 


for any k € Z. We may assume that |c| < |a| otherwise start with XA rather than 
A. Ifc =O then A = +U% for some gq. If A = U% then certainly A is in the group 
generated by X and U.If A = —U4 then A = X°U% since X* = —I. It follows that 
here also A is in the group generated by X and U. 
Now suppose c # 0. Apply the Euclidean algorithm to a and c in the following 
modified way: 
a=qct+nri 


—c=qQiri +12 


ry = gra +13 


(-1)"r-1 = itn + 0 


where r, = £1 since (a, c) = 1. Then 
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XU ..-XU ®A = £U'! with dn+1 € Z. 


Then 
A= X"U® XU" .-- XU" XU" 


with m = 0, 1, 2,3; 40, 41, ---»4n¢1 € Zand qo--- qn #0. Therefore X and U and 
hence X and Y generate $L2(Z). 

We must now show that 

VaPVravxry xn Ss] (3.2.2) 
is a complete set of defining relations for SL2(Z) or that every relation on these 
generators is derivable from these (see [Ro] or [J] for a description of group presen- 
tations). It is straightforward to see that X and Y do satisfy these relations. Assume 
then that we have a relation 
S = xX" yu xe y@ Agia, a yu xXerti = IT 

with all €;, a; € Z. Using the relations (3.2.2) we may transform S so that 


Sas Xayu... yom Kent 


with €1, €m41 = 0,1, 20r3 anda; = lor2 fori = 1,...,mandm > 0. Multiplying 
by a suitable power of X we obtain 


YUX...y%mX = X%= S 


with m > 0 and a = 0, 1, 2 or 3. Assume that m > 1 and let 
a —b 
5=(2.2). 


a,b,c,d>0,b+c>0 


We show by induction that 


or 
a,b,c,d<0,b+c <0. 


This claim for the entries of S, is true for 


_f 10 ray _f{-i 1 
yx= (1,9) ma y2x= (G1). 
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Suppose it is correct for Sy) = ( “ Bal Then 
—c; ay 


_ aq —b, 
EEO ae +c) by + i) ane 


2 — (-a-c bh +d 
Pasa (rams) 
Therefore the claim is correct for all S; with m > 1. This gives a contradiction, for 
the entries of X° with a = 0, 1, 2, or 3 do not satisfy the claim. Hence m = 0 and S$ 
can be reduced to a trivial relation by the given set of relations. Therefore they are a 
complete set of defining relations and the theorem is proved. 


Corollary 3.2.3. The Modular Group M = PSL»(Z) has the presentation 
M=<x,y;xr°=y=1>. 


Further x, y can be taken as the linear fractional transformations 


ee ee er ee 
z z+1 


Proof Thecenter of SL>(Z) is +/. Since X* = —/ setting X* = J inthe presentation 
for SL2(Z) gives the presentation for M. Writing the projective matrices as linear 
fractional transformations gives the second statement. 


In group theoretical language this corollary says that M is the free product of a 
cyclic group of order 2 and a cyclic group of order 3 (see [Ro]). From this structure 
it is easy to show that any element of M of order 2 must be conjugate within M to x. 
Further a straightforward calculation shows that a projective unimodular matrix has 
order 2 if and only if its trace is zero. Combining these two facts gives an easy proof 
of Lemma 3.2.5 which was the crux of the proof of Fermat’s Two-Square Theorem. 


3.2.4 Lagrange’s Four Square Theorem 


In the last section we considered when a natural number can be expressed as a sum 
of two squares. Here we prove the following theorem of Lagrange which shows that 
any natural number can be expressed as the sum of four squares. In the language of 
forms this says that any natural number is represented by the form f(x, y, z, w) = 
x? + y? + 27+ w’. The Lagrange Four-Square Theorem is actually a special case 
of Waring’s problem. In 1770 Edward Waring stated, but did not prove, that every 
positive integer is a sum of nine cubes and also a sum of nineteen fourth powers. 
Waring’s problem then became whether for each positive integer k there is an 
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integer s(k) such that every natural number is the sum of at most s(k), kth powers. In 
this formulation, Lagrange’s theorem says that s(2) = 4. Wieferich proved Waring’s 
assertion about cubes that is every natural number can be written as a sum of nine 
cubes. D. Hilbert in 1909 proved Waring’s problem for all exponents k. Subsequently 
there have been several other proofs given of this same result including ones by Hardy 
and Littlewood [HL], Vinogradov [V] and Linnik [Li]. Linnik’s proof of the general 
result can be found in the book of Nathanson [N]. We give a proof of the four square 
result. 


Theorem 3.2.7 (Lagrange) Every natural number n can be represented as the sum 
of four squares 
n=@4+bh4+24+a 


with a, b,c, d € Z. 


Proof Now 1 = 17 +07 + 0? + 0? and 2 = 17+ 17 +0? +0? so the theorem is 
clearly true for n = 1, 2. Further the product of two sums of four squares is again a 
sum of four squares. That is 


(24RP424aACe+y4+24w) =A? 4+ B40 4D? 


where 
A=ax+by+cz+dw, B=ay—bx—cw-+dz, 


C=az+bw-—cx-—dy, D=aw—bz+cy-—dx. 


This implies then that we need only prove the theorem for primes. Therefore let p 
be a prime p > 3. 


We need the following lemma. 
Lemma 3.2.6 Let p be aprime. Then there exist x, y € Zwithx? + y* = —1mod p. 


Proof This is clear for p = 2 so assume p > 3. Consider the squares modulo p. That 
is consider the set 
S = {17,27,...,(p — 1)7} modulo p 


Since a? = b* mod p implies that a= +b mod p it follows that there are po 
elements of S which are incongruent mod p. Therefore if we consider the integers 


—x?—1forx =0,1,...,p—1 


we must get some x € {0, 1,2,..., p — 1} such that —x? — 1 = y? mod p for some 
y €{0,1,2,..., p— I}. 
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Proof (Theorem 3.2.7) From Lemma 3.2.6 there is a natural number m and integers 
x, y such that 
mp =x? + y? +17 +0°. 


We may assume that |x|, |y| < Sp so that m < SD. Ifm = | then the theorem holds. 
Suppose then that m > 1. 
From the above we have that for each prime p > 3 there is an m with m < 5 Dp 
and 
mp=xrt+y+2+w’,x,y,Z,w EZ. 


We will show that there is then a choice with m = 1. 


Let a, b, c,d be the positive residues of x, y, z, w respectively mod m with the 
smallest absolute values. Then |a, |b], |c|, |d| are all < 4. Then 


pm =x? +y4+24uw? =a’ 4+? +c? +a? =0modm. 


Hence 
a@+bh?+c?+d* =mn'’. 


It follows then that 
perm! = (7 + +0 4+w)y@+h +c +d’) =A? +B? +C? + D? 
where A, B, C, D are described as in the beginning of the proof. From these expres- 


sions since 
a=x,b=y,c=z,d=wmodm 


it follows that 
A=B=C2=D=Omodm. 


Dividing through A”, B?, C?, D* by m” we can then represent pm’ as a sum of 
four squares. 
Now from 


2 2 2 2 
a+b +c°+d 
oa and |al, |b], |c|, |d| < 
m 


w|s 


we get that m’ < m. If m' < m then we have a smaller multiple m’ of p such that 
m’p is a sum of four squares. Assume then that m’ = m. We show that in this case 
p is asum of four squares. m = m’ implies that 


lal = |b] = [el = |d| = — 
a| = =!c| = SS 
2, 
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Then 


2a = 2b = 2c = 2d = 2x = 2y = 2z = 2w =O0modm. 
It then follows that 
4pm = Ax? + Ay? +477 + 4w? = vm? 
forsomev € Z, v 4 0. Hencem|4p. From (m, p) = | we get that m|4. Recall further 


that|<m< 5D. 
If m' = m = 4 then x, y, z, w are all even so from above we get that 


_ (*\2 Zo) av) Wo 
p= (5) +) +6) GY. 
Ifm =m’ = 2 then 
4p =(1+14+0+4+0)2p = (1+14040)(0? + y* +27 4+ w?) = A? + B24 C7 + D* 


wih A=x+y,B=y-—x,C=z+w, and D= w —z. Since A, B,C, D are all 
even we get a representation for p as a sum of four squares as above. 

Therefore for each pm,m > 1 which is a sum of four squares we can find a pm’ 
with m’ < m which is also a sum of four squares. Therefore the minimal m must be 
1 and p itself is a sum of four squares proving the theorem. 


We note that we can further show that if a natural number n is not of the form 
4*(8n +7) then n can be expressed as a sum of three squares. However if n = 
4* (8n + 7) then four squares are necessary. This is related to the following extension 
of Waring’s problem. Hilbert’s solution showed that given k there exists an s(k) 
such that every natural number can be represented as a sum of s(k), kth powers. The 
extension asks to find the minimal value of s(k). More details on this are in the book 
of Ribenhoim [Ri]. 


3.2.5 The Infinitude of Primes Through Continued Fractions 


In this final part of Section3.2 we give a proof of the infinitude of primes using 
continued fractions. A complete discussion of the theory of continued fractions can 
be found in [NZM]. We just touch on what we need for this proof. 


Definition 3.2.2 Let ao, a,,...,d, be a finite sequence of integers all positive 
except possibly ag. Then a finite simple continued fraction is the rational num- 


ber defined by 
1 
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If do, 1,..-,An,... is an infinite sequence of integers all positive except possibly 
ao, then a infinite simple continued fraction is determined by the limit of the finite 
simple continued fractions formed up to a,. Each of the finite simple continued 
fractions is called a convergent of the infinite simple continued fraction. 


The following can be proved (see [NZM]). 


Theorem 3.2.8 [fdo,a),...,dn,... is an infinite sequence of integers all positive 
except possibly do, then they determine a unique infinite simple continued fraction, 
that is the limit of convergent exists. Further this value is always an irrational number. 


If the sequence defining a continued fraction becomes a periodic sequence after a 
certain point the resulting continued fraction is called a periodic continued fraction. 
Consider an infinite continued fraction with sequence do, a;,... and let A,»,,By, be 
the numerator and denominator respectively for the mth convergent. We need the 
following results, the first being a theorem of Lagrange (see [P]). 


Theorem 3.2.9 A real irrational number which is a solution of the quadratic equa- 


tion 
ax? +bx+c=0 


with a,b,c,d € Z and not all zero, has a development as a periodic continued 
fraction. 


As a special case of the above theorem we have that if 


244 
x= PAVE, with p #0, p €Z 
then 
ie 1 
. —— 
P ree 


Lemma 3.2.7 ({P]) Suppose d is a positive squarefree integer. If the development 
of Vd as a periodic continued fraction has a period of length m then the equation 
x? — dy* = —1 has an integral solution and each positive solution x, y is of the form 
x = Aj, y = B; fori = qm — 1 with q odd. 


Using these we get the following proof of the infinitude of primes due to 
Barnes [B]. 


Proof (The sequence of Primes is infinite) As always assume there are only finitely 
many prime numbers 
Pi=2<pro=3<-:-< p,. 
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Let p = pi --- py andq = p2--- p, = &. Now let 
Pe aah 8 ils, 
= i 


Then 


x=qt+Vq@et+l. 


Since p; does not divide gq? + 1 for i = 2,...,r it follows that g* + 1 must be a 
power of 2. Further this power must be odd since x is irrational. Hence 


gt+1=2* 1 teEN. 


This gives 


and hence the Diophantine equation 
x? —2y? =-1 


has a solution x = q, y = 2'. From Lemma 3.2.7 then a is an even convergent value 
of 


/2 = 14 


2+ a 


It can be shown that 
B+ = Am+1 Bm a Bn-1; m= 1 


where as before B, is the denominator of the kth convergent. From this it follows 
that form > 1, Bo, is a positive odd integer > 1. Since 2’ is even we then must have 
m = O and hence 

q _ Ao 


1 
=—=-_=], 
2 Bo 1 


Then from (g,2') = 1 we get g =1 which is a contradiction since g = p2:-- 
pi = 1. 


3.3. Dirichlet’s Theorem 


If (a, b) = 1 for natural numbers a and b then Dirichlet’s Theorem states that there 
are infinitely many primes in the arithmetic progression {an + b;n € N}. On one 
hand, given the many proofs that we have exhibited of the infinitude of primes, this 
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may not seem surprising. However when looked at in light of the prime number 
theorem which says that the density of primes gets scarcer and scarcer as x gets 
larger it is quite surprising. Since an + b is linear in n the distribution of numbers 


in this sequence is uniform or regular on the integers. However since 7(x) ~ => we 


have that me) i =: We can interpret this as the probability of randomly choosing 
a prime < x goes to zero as x goes to Oo. 

Earlier in this chapter we presented several special cases of Dirichlet’s Theorem. 
Specifically we showed that there were infinitely many primes of the form 3n + 1, 
3n4+2,4n+ 1, 4n +3, 8n+ 1, 82 +3, 8n +5, and 8n + 7. Many other specific 
situations, such as 6n + 5, can be proved by the same techniques. The most general 
case that we proved was Theorem3.1.12 which showed that there were infinitely 
many primes of the form mn + 1 for any positive integer m. A complete proof of the 
full Dirichlet Theorem involves analysis and we present it in this section. 


Theorem 3.3.1 (Dirichlet’s Theorem) Let a,b be natural numbers with (a, b) = 1. 
Then there are infinitely many primes of the form an + b with n > 0. 


Dirichlet’s proof rests on two concepts; Dirichlet characters and Dirichlet series. 
The basic idea is to build for each integer a, a series, which would converge if there 
were only finitely many primes congruent to b mod a and then show that this series 
actually diverges. We discuss characters first. 


Definition 3.3.1 For any positive integer k, a Dirichlet character modulo k, is a 
complex valued function on the integers x : Z — C satisfying 


1. x(a) =Oif(a,k) > 1 

2, x1) #0 

3. x(a1a2) = X(a1)X (42) for all ay, a, € Z 
4. \(a,) = x(d2) whenever a, = az mod k. 


From (3) and (4) it is clear that a Dirichlet character can be considered as a mul- 
tiplicative complex function on the set of residue classes modulo k. We will shorten 
the notation and use the word character to mean a Dirichlet character modulo k. 

From a group theoretical point of view a Dirichlet character is just a character of 
a finite complex representation of the unit group U (Z,,). We will say more about this 
after our discussion of characters. 

As an example consider the function 


xo(a) = Oif (a,k) > 1 

xo(a) = Lif (a,k) = 1. 
It is easy to verify that this is a character. Thus, modulo k, there is always at least 
one character. The character above is called the principal character and exists as 


defined for each k. We will presently show that there are @(k) characters, where ¢ is 
the Euler phi function, for each positive integer k. 
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We now describe some necessary properties of characters. In each of the following 
results when we say character we mean character modulo k, with k > 0 fixed. 


Lemma 3.3.1 (J) For every character x(1) = 1. 
(2) For every character if (a, k) = 1 then lx (a) |? = 1. Hence |x(a)| = 1 and 
x(a) is a b(k)-th root of unity. 


Proof (1) Since x is multiplicative we have y(1) = y(1)x(1). Since x(1) £ 0 it 
follows that y(1) = 1. 
(2) From Euler’s Theorem (Theorem 2.4.11) we have that if (a, k) = 1 then 
a® = 1 mod k. 


Since a character is multiplicative this implies 


Ix@ 1 = |x@*)| = IxQ)| = 1. 


Lemma 3.3.2 For every k > 0 there exist only finitely many characters mod k. 


Proof Given k there are only finitely many different residue classes mod k. If a is a 
positive residue mod & then from the previous lemma x(q) is a @(k)-th root of unity. 
Hence there are only finitely many choices. 


For the time being we will let c denote the finite number of characters modulo k. 
After we prove certain orthogonality relations we will show that c = $(k). 


Lemma 3.3.3 (J) If x; and x2 are characters then so is x, x2 where 


(xix2)(@) = x1 (a) x2). 


(2) If x is a character so is its complex conjugate X. Further x(a)~! = x(a). 
(3) If x1 is a fixed character and x runs over all characters then so does xX. 


Proof The proofs of (1) and (2) are straightforward verifications of the four properties 
in the definition of a character and we leave these to the exercises. 

For part (3) suppose that (a, k) = 1 and x1(a)x2(a) = x1(a)x3(a). Then since 
x1(a) 4 0 it follows that y2(a) = 3(a). Hence if y is a fixed character and we let 
x1 run over all c distinct characters then yx; are again c distinct characters and hence 
must be all of them. 


We need to prove certain orthogonality relations among the characters. The next 
Lemma is crucial to do this and contains much of the work in proving these results. 


Lemma 3.3.4 [fd > Oand(d,k) = 1 withd 4 | modk then there exists a character 
for which x(d) # 1. 
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Proof Since x(a) = Oif (a, k) > 1 it follows that to determine a character for which 
x(d) € 1 we must only find a function satisfying properties (2), (3), (4) of the 
definition of a character for (a, k) = 1. 

Letk = pj'--- pé" be the prime decomposition of k. Sinced 4 1 mod k it follows 
that for one of the prime divisors p of k we have d £ 1 mod p’ for some t > 0. 
Suppose first that p is an odd prime divisor of k satisfying this, that is d € 1 mod p’ 
where p‘|k. Then p does not divide d since (d, k) = 1. 

Recall that the unit group modulo p’ is cyclic, that is, there is a primitive root 
g modulo p'. There are d(p’) primitive roots so choose g # d. (See Theorem 2.4.3 
and Section 2.4.4). If (a, k) = 1 then a is a unit modulo k and hence a power of g 
modulo k. That is 

a =q’ mod p' withb > 0. 


Let o be the root of unity given by 


2ni 


o = ev") 


and define for each a with (a, k) = 1 witha = ia as above 


x(a) = a. 


Further if (a, k) > 1 define x(a) = 0. This defines a function on the residue classes 
mod k. We must show that y is a character and that y(d) 4 1. 

Property (1) of the definition of a character is clear from the definition of y. 
Now y(1) = o° = IL since g° = |. Hence y(1) ¥ 0. Further if (a), k) = (a, k) = 1 
thena, = g?! and a2 = g” mod p’. This implies that y(a;) = oP, X(a2) = ao”, But 
ajay = g?'*”2 mod p' and hence 


gbrthe = high = x(a1)x (a). 


x(a1a2) = 


Therefore y is multiplicative. 

Finally if a; = a) mod p‘ thena = a = az and hence y(a,) = x(a). Therefore 
x is a character. Since d € 1 mod p‘ then d = g'mod p’ for some r with ¢(p’) not 
dividing r. Therefore 


y(d) =o" £1. 


The above proof works whenever we have an odd prime divisor p of k withd # 1 
mod p’. This leaves only the prime 2. Now suppose that d € 1 mod 2' where 2'|k. 
If k = 2q with g odd then d = 1 mod 2 since (d, k) = 1. Therefore if d € 1 mod k 
and k = 2q with q odd there must exist an odd prime divisor of k with d € 1 mod p* 
and we are back to the first case. Hence we have that k = 2'q witht > 1 andd 41 
mod 2°. 

Now d = | mod 2 and hence d = 1 mod 4 or d = 3 mod 4. We consider each of 
these cases separately. 
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If d = 1 mod 4 then ¢ > 2. If (a, k) = | then clearly (a, 2) = 1. Then it can be 
shown that (see exercises) 


a = (—1)2 5° mod 2! for some b > 0. 


Now let 


and define x(a) = o”. Since b is determined mod 2'~? it follows that x is well defined 
on the residue classes mod k. As in the odd case if we define y(a) = 0 for (a, k) > 1 
then it is straightforward to verify that y is a character. Again as in the odd case since 
d #1 mod 2! and d = 1 mod 4 then d = 5" mod 2! with r not divisible by 2'~?. 
Hence x(d) = 0’ £1. 

If d = 3 mod 4 then d = —1 mod 4. For (a, k) = | define 


x(a) =(-)*. 


As in the other cases it is straightforward to verify that y is a character. Here y(d) = 
—1 #1. This completes the proof of Lemma 3.3.4. 


The next two theorems are called the orthogonality relations for Dirichlet char- 
acters. They are special cases of general results on characters of representations of 
finite groups. 


Theorem 3.3.2 (Orthogonality Relations I) (1) If x is a fixed character and a runs 
over a complete set of residue classes mod k then 


DY x@ = 6 ifx = xo 


a 


> x@ =0ifx # x0. 
(2) Ifa > 0 is an integer then if x runs over the set of all c characters 
DVx@ =cifa=1modk 
x 
Sx@ = 0 ifa $1 mod k. 
x 


Proof (1) Let xo be the principal character as defined immediately after Defini- 
tion 3.3.1. That is 
xo(a) = Oif (a,k) > 1 
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yo(a) = lif (a,k) = 1. 


If a runs over a complete set of k positive residue classes mod k then 
Dd x0@ 
a 


has #(k) terms each with value | and (k — ¢(k)) terms each with value 0. Hence 
> xo(@) = of). 


If y 4 xo choose a d with d > 0, (d,k) = 1 and y(d) € 1. This exists by 
Lemma3.3.4. Then as a runs over a complete residue system mod k so does da. 


Then 
Dd x@ = >) x(da). 
But x is multiplicative so 


DY x@ = >) xa) =D) x@x@) = x) DV x@. 


Since y(d) # 1 it follows that }°,, x(a) = 0. 


(2) Fora = 1 modk the sum >> yi x(a) runs over c characters. From Lemma 3.3.1 
each of these has value | and the sum has value c. 

If (a,k) > 1 then each of the terms in the series is zero so the sum vanishes. 
If (a,k) = 1 but a is noncongruent to 1 modulo k then there exists a character 
(by Lemma3.3.4) with y;(a) € 1. Now as yx runs over all c characters then by 
Lemma 3.3.3 so does x; x. Hence 


DVx@ = Vu@x@ = u@ > x@. 
x x x 


Since x1 (a) # 1 it follows that ae x(a) = 0. 


We can now prove that c, the number of distinct characters mod k, is exactly $(k). 
Corollary 3.3.1 There exist exactly 6(k) characters modulo k. 


Proof There are exactly ¢(k) positive residues a with (a, k) = 1. If we sum over all 
c characters and ¢(k) residues we get using the orthogonality results above 


DYx@ = DY x@ =c+04--- +00. 
a,x a x 
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On the other hand 


Dx@ = DIY x@ = o&) + 04---+0= 4). 
x a 


a,x 


Therefore c = @(k). 


Theorem 3.3.3 (Orthogonality Relations II) (1) If x, and x2 are characters mod k 
and a runs over a complete set of residue classes mod k then 


DY xu @x@ = o) ifxi = x2 


DVS rxu@xe@ = Vif # x. 


(2) If a > 0 is an integer and (a, k) = | then if x runs over the set of all é(k) 
characters 


DS xOx@ = o(k) ifa =t modk 
x 


DY xOx@ = Oifa €tmodk. 
x 


Proof (1) From Lemma3.3.3 we have that for any character y~! 


X1 = X2 then 


= x. Hence if 
x1(4)X2(@) = X1(2) x1 (4) = xXo(a) 


where xo is the principal character. Therefore from Theorem 3.3.1 
DV u@nu@ = >) x0@ = ¢@). 
Ify1 # x2 then a # X2 and hence y1X2 # Xo. Then again from Theorem 3.3.1 
Su @@ = 0. 


(2) The proof of the second part of the theorem follows in an analogous manner 
from Theorem 3.3.1. We leave the details to the exercises. 


Before moving on to Dirichlet series we mention that Theorems 3.3.1 and 3.3.2 
are special cases of general results in group representation theory. If G is a finite 
group then a (matrix) representation of G is a homomorphism p: G > GL,,(R) 
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(see Section 3.2) for some n and some ring R. Hence p(g) is aninvertiblen x n matrix 
for g € G. The character of the representation p is the function y, : G > R given 
by xp(g) = tr(p(g)). For any finite group G there are orthogonality relations on the 
set of characters which specialize in the case of finite abelian groups (for complex 
representations) to the theorems on Dirichlet characters. The book by Curtis and 
Reiner [CR] is a standard reference on representations of finite groups. A more 
elementary treatment can be found in the book by M. Newman [New 1]. 
The next ingredient in the proof of Dirichlet’s Theorem is Dirichlet series. 


Definition 3.3.2 [fy is a character mod k then the Dirichlet L-series is defined for 
complex values s by 


L(s,x) = y a 


n=1 


A rough outline of the way these series lead to a proof of Dirichlet’s Theorem is 
as follows. Consider (a, b) = 1 and consider Dirichlet characters mod a. It can be 
shown that for s > 1 the series L(s, x) is an analytic function of s and further for 
s > | satisfies an analogue of the Euler product, (see Section 3.1.2 and [N]), that is, 


Ls, = TJa- 2 
e P 


Then by logarithmic differentiation 


L(s,x) x(p) In p 
L(s, x) = Lip ps —x(p) 


If we introduce the function A on N by 


Inp forn=p‘,c>1 
A(n) = 
0 for all other n > 0 


then the above can be rewritten as 


L'(s, x) “3 se) 
L(is,x) “> 
The function A(n) is called the von Mangoldt function and will also play a role 
in the proof of the prime number theorem. Multiplying by y(b) and then summing 
over all other characters * we get by the orthogonality relations 


A(n) 1 Sf ts, ) 
——— TC ase | 
2 ns ba) ie i( L(s, x*) 


n=b mod a 
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As s — 1* the left hand side becomes approximately 


es 


p=b moda P 


What must be shown is that the right hand side becomes infinite. This would then 
imply that the number of primes congruent to b mod a must be infinite. 

It can be shown that for the principal character we have ae mI > coass > It, 
It follows that to show that the right hand side above becomes infinite we must show 
that FEX remains bounded for any non-principal character. To show this we must 
show that L(1, x) 4 0 for any non-principal character. We now outline a series of 


results which prove all these assertions. 


Theorem 3.3.4 For any character x mod k the Dirichlet L-series is an analytic 
function for s > 1. Further it has an Euler product representation 


x(P).— 
L(s,x) = [Ja-= >". 
P P 

The proof of this theorem follows from the following sequence of lemmas. 
Lemma 3.3.5 L(s, x) is absolutely convergent for s > 1. 


Proof From Lemma3.3.3 we know that |(7)| < 1 and hence ae < 4. Therefore 


[o,@) CO CO 1 
IL(s, 01 = > < pa <> = 
n=1 n=1 


n=1 


which converges for s > 1. Hence L(s, vy) is absolutely convergent for s > 1. 


Lemma 3.3.6 The series 
CO 
+ x(n) Inn 
n=1 n 


converges absolutely for s > | and further in this range 


(oe) 


L'(s, x) = -> x) Inn 


ns 
n=1 


Proof Fors > 1 + € we have 
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However °°, ine. converges by the integral test. Thus the given series converges 


uniformly for s > 1 + € and hence absolutely for s > 1. Now L(s, x) = °°, te 
so by uniform convergence we can differentiate termwise and therfore 


HE. xo 


n=1 


(Recall that if y = n~* then y’ = —n~“* Inn.) 


Let j be the Mobius function defined for natural numbers n by 


1 ifn = 1 
(nn) = 4 (-1)" ifn = pi po--: p; with p,..., p, distinct primes 
0 otherwise. 
Then 
Lemma 3.3.7 The series 
oe) 
> x (71) (7) 
ns 


n=1 
converges absolutely for s > | and further in this range 
[o.e) 
Xen) 
: ia 
5.x) 2d = 
It follows that L(s, x) AO fors > 1. 


Proof As before |e) | < < + so the absolute convergence follows from the con- 


vergence of the series }°™- , + for s>l. 
Now it can be shown that for the Mobius function ju(7) we have 


> Hd) = E — 


an ifn > 1. 


(See Theorem 2.4.8 for a similar type result and Section 3.6 for a proof.) 
Using this above fact we then have 


= x xin 3 xen) 3 OHO) SP XOY nan = 1. 
t=1 


m=1 t=] mn=t n\t 
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Therefore 


L(s,x)- > wn) = 1, 


n=1 


We can now obtain the indicated Euler product representation for L(s, vy). 


Lemma 3.3.8 Fors > 1 we have the Euler product representation 


x(p). 
p* 


Ls, =|] 
Pp 


Proof Form > 1 let S be the set of all positive integers n not divisible by any prime 
p > m. Then we have 


- x) x(n) u(r) 
psm neS 
All n < m are included in the set S and therefore 


[a = y xo) + y eat 


p<m l<n<m n'>m 


where the second sum runs over those n’ > m which are not divisible by any prime 
p > m. Now as m — ov the first sum on the right goes to 


sy x UO) 1 
n=1 «LG, x) 
by Lemma3.3.7. ane second sum on the right approaches 0 since its absolute value 
is less than )° Combining these 


n>m = : 


Mp. A _ x(p) 
[a ia : 


Pp 


Recall that the von Mangoldt function A(n) was defined for positive integers by 


A(n) = Inp ifn=p*,c>1 
0 for all other n > 0. 
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We then get: 


Theorem 3.3.5 (/) Fors > 1 we have 


L'(s,x)__ Wo x(nA(n) 
L(s, x) = 


(2) As s + 1+ we have for the principal character xo, 
_ Ls, Xo) 
L(s, Xo) 
Proof Since |y(n)A(n)| < Inn it follows that the series bell Aw converges 


absolutely for s > 1. 
Now it can be shown, in a similar manner as for the Mobius function, that 


VA® =Inn 


d\n 


(see exercises). Hence for s > 1. 


ee > xen (n) = > xm) > a (n) 


n=1 


e 


n=1 m= 


= SEG, x). 


For the principal character yo we have yo(n) = 1 if (n, k) = | and 0 otherwise. 
Therefore from the first part of the theorem it follows that 


_ L'(s, xo) _ > A(n) 
Lis, Xo) n=1,(n,k)=1 : 


ay “Susy 2 p™ 


ns 
n=1 P\k m= if 


= A(n) Inp 
d ns > ps = 1 i 


p\k 


Ass — | the second term on the right is finite. Hence to prove that — oa —> 0o 


as s > 1*twe must only show that the first term in the expression above diverges. 


124 3 The Infinitude of Primes 


ae Euler’s proof of the infinitude of primes we know that ZG + diverges. Since 


aE ee , it follows that >’, "P diverges and hence so does )°~, * 0) 
Pp Pp n 
every t s 0 there exists an m = m(t) for which 


py a 


n=1 


. Hence for 


For | < s < 1+ €(t) we then have 


m 


A(n) < A(n) 
2 eS Det 
n=1 n=1 


From this last inequality it follows clearly that the sum diverges. 


We now have one big brick of Dirichlet’s proof in place that is that for the principal 


character ; 
-L (s, Xo) 


L(s, xo) 


> coass > 1”. 


As explained above we now need to know that L(1, vy) does not vanish for any 
non-principal character. This is the most difficult part of the proof. 
First three more preliminary results are needed. 


Lemma 3.3.9 [ft >m > 1 and x is not the principal character then 


ok 
Exes “ 


Proof By the orthogonality relations the sum >" x(a) over a complete set of residues 
is zero. Hence in the given sum we may assume that there are < (k — 1) terms. In 
a complete set of residues exactly (k) terms have |y(a)| = 1 and all the remaining 
terms have |y(a)| = 0. Ifbetween m and t there are at most ao ® terms with lx(a)| = 1 
then 


k 
xo! < Six < < — 


n=m n=m 
If there are more than a) such terms then 
m+k—1 m+k—1 
| > xMI=1 Do xM- DY xm 
n=m n=m n=t+l 
m+k—1 m+k—1 


k 
=| xm s DY) kx! < _ 


n=t+l n=t+l1 
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Lemma 3.3.10 For any character x and s > | we have the inequality 
(L6s, x0)" ILCs, OV IEG, XP > I. 

Proof For real numbers x, y with 0 < x < 1 we have the inequality 
Cs —xe? 71 — ae? |? = 1, 

(See the exercises.) 


If p is a prime which does not divide k let y(p) = e!” and let x = ? Applying 
the above inequality then gives 


2 
fl a i ah a —X (Pp < i 
?’ 


Multiplying over all primes and using the Euler product representation of the L-series 
then gives the stated inequality. 


Lemma 3.3.11 For any non-principal character x we have |L'(s, x)| < $(k) for 
s>l. 


Proof From Lemma3.3.6 we have 
[o.e) 
x(n) Inn 
IL(s, 301 = 1 | 
n=1 


for s > 1 and so we work with the right hand sum. 
It is straightforward to show that the function f(t) = tn is a decreasing function 
for t > 3. Therefore from Lemma3.3.9 we have for t > m > 3 the inequality 


p> x(n Inn 2 o(k) nm _ $(k) Inm 


2 m —~— 2 m- 


Hence the series for L’(s, x) converges uniformly for s > 1. In this range taking 
m = 3 and letting t > oo it follows that 


< o(k). 


a _ m2 ok) In3 © ge 
> 2° 9 3 37 


Theorem 3.3.6 L(1, x) # 0 for any non-principal character and further for any 


non-principal character pk is bounded for s > 1. 
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Proof We break the proof into two pieces. The first for nonreal characters, that is 
characters which take complex values and the second for real, but not principal 
characters. This second part is the most difficult. 

From Lemma 3.3.9 we have for any non-principal character 


xls 5, 


Therefore for any non-principal character with s > 1 


IL(s, x)| < O(k) 
letting m = | and t — oo in the above inequality and using that bx@ol < |x()|. 
Assume first that y is a nonreal character. Then 7 is not the principal character 
for if it were x would have to be real. Then from the remark above we have for s > 1 
that |L(s, x7)| < (k). On the other hand if 1 < s < 2 we have 


a ea | ° dz 
L(s, Xo) = > —_ . oe = (3.1) 
n=1,(n,k)=1 n=1 1 z 
1 Ss 2 
= s—l s—l = s—l G2) 


Applying Lemma 3.3.10 we have 


1 1 , Sob 1 , edi 
(L(s, xo)# |L(s, x2)]8 23 Jd) 2/6) 


If Ld, x) = 0 then for s > 1 


|L(s, x)| = 


IL(s, x)| = |L(s, x) — LC, x) = ff L'(t, x)dt| < o(k)(s — I. 
1 


Hence for 1 < s < 2 we would have 


1 


(or 
36003 


However this inequality is false for s = 1+ mor Therefore L(1, x) £0 for y 
16g(k) 2 
any nonreal character. 
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Now assume that x is areal character but not the principal character. As remarked 
earlier this is the most difficult part. To begin we define the function f(m) on the 
positive integers n by 

fa) => ox@. 
d\n 
Then we can prove that (see exercises) f(n) > Oforalln > land f(n) > lifn = C7 
a square. 

Let m = (46(k))° and z = pee 2(m — n) f (n). Applying the definition of f(n) 

we have 
Z= ~, 2(m — uv)x(v). 


uv<m 


Since f(n) > 0 and f(c?) > 1 we have 


m 


vn wy 
z= > /2(m—v*) = >) Amv’) 
v=1 =I 


v 


— 3. 3 3 
> 12m — Fy = Gm? = Fas)”, 


Let 


as >) 2m —uv)x(v) 


u=1_ 3 
m2 <v<" 


22> >, > 2(m — uv)xX(v). 


v=! 0<u<* 


C x 1 2 2 ae . 
Then it follows from uv < m that either u < m3, v > m3 or v < m3. This implies 
then that 

Z= 21+ 22. 


Suppose that z(7) is a complex valued function on the natural numbers. Let c be 
a natural number and for t > c let r(t) = a z(n). Let r(u — 1) = 0. Ford >c 


lety = >, <4 Ir()| and let €. > €.41 > +++ > €q > 0. Then 


d d d-1 


> 22(n) = >" en(r(n) — rn — 1) = Dr) en — ent) + rez. 


n=C n=C n=C 
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This then implies that 


d-1 


Dee < U(r — €np1) + €a) = VEC. (3.3.1) 


n=c n=C 


From Lemma 3.3.9 


d 
xs 


n=C 


Applying the above remarks to this inequality with €, = + we get 


2) < ek) 1 _ 9) (3.3.2) 


ns |S 2 cc ~ 2 


1 1 
3 m3 


a= >). >: (om — uo)x(o)| SD 2m oO m3 o(k). 


Now as defined 


=>) dS 2Am-=uw)x(0). 


v=! 0<u<* 


Let 6 = “ —["] where [ ] is the greatest integer function. Then 0 < 0 < | and 


Dm — 2uv) = 2m P71 — v Yu = 2m[=] - f= ]C=] + Y 


ne =O) =n =O) a8) 
Vv Vv Vv 


os 2m0 — v(— — 20 +0 +" — 4) 
Uv Uv Uv 


= amauta R). 
Vv 


Since 0 < 6 < 1 we have |9 — 6?| < 1 and hence 
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m3 


m o xe —m S10 + Lxwve - 0°) 


<mLd~- > +m 4 nS 


2 
v=m3+1 


Applying the inequality 


k) 1 k 
rs o — Ph) 


‘oo ~ 2c 
and letting c = mi +1,v— © we obtain 


Za < mL(1, x) +m 


a “ +m? +m d(k) 


mM 3 


4 1 1 
= mL (1, x) + mi gk(5 +5 +1) 


= mL (1, y) + 2mid(k). 


It follows then, summarizing all these inequalities, that 


+40)" <¢<mL(1, xy) +3m(k) = mL, x) + 3(4d(K)86(K) 


= mL, ~) + “(4th)”. 


This then clearly implies that m?L(1, x) > 0 and therefore L(1, x) > 0. Hence 
LC, x) 4 0 for y areal non-principal character completing the proof that L(1, vy) 4 
0 for any non-principal character. 

We must now show that 4 ee 7 remains bounded for s > 1. Since L(1, vy) 4 0 it 
follows that =~ a is bounded for s > 1. From Lemma 3.3.11 L’(s, y) is also bounded 
for s > 1 completing the proof. 


The final piece is 


Theorem 3.3.7 Suppose (t,k) = 1, t > 0. Then for s > 1 we have 


—L"(5,x) A(n) 
25 2X Gad x 2 , 


ns 
n=t mod k 


130 3 The Infinitude of Primes 


Proof Fors > 1 we have from Theorem 3.3.4 that 


L'(s,x)_ Wax A) 
L(s, x) —— nm 


n=1 


Combining this with the orthogonality relations for characters we get 


1 L'(s,x) _ 1 Sxmatn) 
2 x(t) L(s, x) pz x() 2, n° 


_ A(n) 1 _ A(n) 
~~ ns 25 = = ns pile: 


n=t modk 


We can now give the proof of Dirichlet’s Theorem. 


Proof We suppose that (a, b) = 1 and we want to show that there are infinitely many 
primes of the form an + b or equivalently infinitely many primes congruent to b mod 
a. We consider the Dirichlet characters mod a. Apply Theorem 3.3.6 with a = k and 
b =t so that i “ 
6 =. § eo 
ea) < L(s, x) 


ns ~ 
n=b mod a 
As s — 1* the left hand side approaches oo since the term for the principal character 
goes to —oo while the other (a) — 1 terms remain bounded. Therefore we have as 
s — 1* and with all congruences mod a 


St Dew. 


Now 


Ppsmym>1 p.mym>1 


= » wees > 1 


(P.M) pm =p mod azm>1 
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Therefore the second sum 


3 In p 


p™ 
(P,M) pm =b moda.m>1 


remains bounded as s — 17. It follows that 


SS 223 


s 
p=b moda P 


Therefore the number of primes = b mod a must be infinite. 


Before leaving Dirichlet’s Theorem we would like to mention a beautiful new 
result of Ben Green and Terence Tao [GT] also related to primes and arithmetic 
progressions. It is a classical conjecture that there are arbitrarily long arithmetic 
progressions of prime numbers. This conjecture was hinted at in the work of Lagrange 
and Waring in the late 1700’s (see [D]). In 1939 Van der Corput [VC] established 
that there are infinitely many triples of primes in arithmetic progression. Green and 
Tao [GT] proved the following. 


Theorem 3.3.8 The prime numbers contain arithmetic progressions of length k for 
all k. That is, for all k € N there exists a, b € N with (a, b) = 1 such that 


a,a+b,a+2b,...,a+(k—1)b 


are all primes. 


Their proof is probabilistic and nonconstructive and quite difficult. 


3.4 Twin Prime Conjecture and Related Ideas 


Twin primes are prime numbers p and qg such that |p — q| = 2. For example {3, 5}, 
{5, 7}, {11, 13} are all pairs of twin primes. Trivially 2, 3 is the only pair of primes 
that differ by one. It is not known whether or not there are infinitely many pairs of 
twin primes but an examination of the list of primes shows an abundance of such 
pairs and leads to the following conjecture. 

Twin Primes Conjecture: There are infinitely many pairs of twin primes. 

Despite the twin primes conjecture there is a remarkable theorem of Brun which 
says essentially that even if there are infinitely many twin primes the sum of their 
reciprocals converges. Recall that Euler proved that the sum >* ae - diverges. 
This implied that the sequence of primes is infinite. Here let 


S = {p; p prime and p + 2 prime}. 


That is S is the set of twin primes. Brun’s Theorem is the following 
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Theorem 3.4.1 (Brun) Let S be the set of twin primes then 


converges. 


Notice that if S is a finite set then certainly the sum converges. Brun’s proof 
depends on a method known as Brun’s sieve. We will look at this method as well as 
the proof of Theorem 3.4.1 in Chapter5. We mention some elementary facts about 
twin primes—leaving the proofs to the exercises. 


Lemma 3.4.1 The integer 5 is the only prime appearing in two different twin prime 
pairs. 


Primes are those natural numbers which have only two possible positive divisors. 
The next Lemma gives a similar characterization of twin primes. 


Lemma 3.4.2 There is a one-to-one correspondence between twin prime pairs and 
those integers n > 4 for which n? — 1 has only four possible positive divisors. 


Lemma 3.4.3 Suppose p,q are primes. Then pq + 1 is a square if and only if p 
and q are twin primes. 


Lemma 3.4.4 /f p, g are twin primes greater than 3 then p + q is divisible by 12. 


Brun’s Theorem has been extended to further pairs of primes separated by a 
constant d > 2. For example if d = 4 the pairs of primes of the form (p, p + 4) are 
called cousin primes. Again it is open whether there are infinitely many of these 
(for each d or for any fixed d) but Segal [S] proved that for any given d the sums of 
the reciprocals of the pairs is also convergent. 

In 2014 Y. Zhang [Zh] proved that there is a positive constant with the property 
that infinitely many pairs of primes differ by less than that constant. In 2015 J. 
Maynard [Ma] gave a numerical extension. 


3.5 Primes Between x and 2x 


In Theorem 2.3.2 we saw that there are arbitrarily large gaps in the sequence of 
primes. Despite this fact, the next result, known as Bertrand’s Theorem, says that 
for any integer x there must be a prime between x and 2x. Bertrand verified this 
empirically for a large number of natural numbers and conjectured the result. The 
theorem was proved by Chebyshev. 


Theorem 3.5.1 (Bertrand’s Theorem) For every natural number n > | there is a 
prime p such thatn < p < 2n. 
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Chebyshev’s proof of Bertrand’s conjecture used techniques which he also used 
in obtaining a simple asymptotic bound on (x). This bound was a step on the road 
to the prime number theorem. We will give a proof of Chebyshev’s Theorem in the 
next chapter and defer a proof of Bertrand’s Theorem until then. 


3.6 Arithmetic Functions and the Mobius Inversion 
Formula 


In the course of Chapters 2 and 3 we used several functions, such as the Euler phi 
function ¢(7), the sum of the divisors function a (7), the Van Mangoldt function A (7) 
and the Mobius function (7), whose domain was the natural numbers and whose 
range was contained in the complex numbers. Functions such as these are called 
arithmetic functions or number theoretic functions and play an extensive role in 
Number Theory. Several other functions of this type will be used in the proof of the 
prime number theorem. In this final section of this chapter we take a look at arithmetic 
functions in general and a very important result called the M6bius inversion formula. 


Definition 3.6.1 An arithmetic function or number theoretic function is a func- 
tion f : N > C, whose domain is the natural numbers and whose range is a subset 
of the complex numbers. 


Besides the arithmetic functions that we have mentioned already, very important 
examples are given by the divisor functions. 


T(n) = number of positive divisors of n 
a(n) = sum of the positive divisors of n 


ox(n) = sum of the kth powers of the positive divisors of n. 


These can also be written in the following form. 


c= > 1 


d\n 
o(n) = Pa 
d\n 
ox(n) = Sia. 
d\n 


We saw in Section2.4.3 that if ¢ is the Euler phi function and (m,n) = 1 then 
o(mn) = d(m)¢(n). This property is called multiplicativity. 
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Definition 3.6.2 An arithmetic function f is multiplicative if 


f(mn) = fim) fr) 


whenever (m,n) = I. 


If n has a prime decomposition n = p{'--- p;* and f is a multiplicative arith- 
metic function then f(n) = f(p}')--- f(p;'). Therefore multiplicative arithmetic 
functions are uniquely determined by their values on prime powers. Further notice 
that for any n we have f(n) = f(n) f (1). Hence if there is any n with f(n) 4 0 we 
must have f(1) = 1. 

Multiplicativity is preserved under summing over divisors. More precisely we 
have the following theorem. 


Theorem 3.6.1 Suppose that f (n) is a multiplicative arithmetic function and 


F(n) =>) f@). 


d\n 


Then F (n) is also multiplicative. 


Proof Suppose that n = nin2 with (m1, n2) = 1. If d|n then since n; and no are 
relatively prime it follows that d = did with d\|n,, dy|n2 and (d), d,) = 1. Con- 
versely if d = did with d\|n; and dz|n2 then d|n. This establishes a one-to-one 
correspondence between the positive divisors of n and pairs of divisors d,, d) of 
nN, Nz respectively. It follows that 


fM=>fD™=>) dD. fia). 


d\n d\|ny do|nz 


The function f is assumed to be multiplicative and hence f(did2) = f (di) f (d). 
Therefore 


F(n) = >) fG@) >) fd) = F(m) F(na) 


di|ny d|n2 


proving the theorem. 


This theorem can be used immediately to show that the divisor functions are 
multiplicative. It is clear from the Fundamental Theorem of Arithmetic and the 
definition that 7(7) is multiplicative. From the expressions 


o(n) = did 


d\n 


ox(n) =a. 


d\n 


it follows from the theorem that these are also multiplicative. 
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Lemma 3.6.1 The divisor functions T(n), o(n), ox(n) are all multiplicative. 


The multiplicativity of ¢(”) was used in Section2.4.3 to derive a closed form 
formula for ¢(n) in terms of the standard prime decompositions. The same can be 
done for T(n) and o(n). 


Theorem 3.6.2 Suppose that n = p{' --- p;'. Then 


Tn) = (1+ 1)--- (+1) 


etl et] etl 

Py 1 py 1 Py 
a(n) = (4 \(? )...(-*-). 

pi-l p2-1 Pe-1 


Proof We will exhibit the proof for v(m) and leave the derivation of a(n) for the 
exercises. 

As in the derivation of the formula for ¢(”) we establish the formula first for 
prime powers. The general result then follows from the multiplicativity. 

Suppose then that n = p® and consider 


T(n) = > 1. 


d\n 


The divisors of p® are 1, p, p?,..., p® and hence 


MM=70)= > l=+ Dp: 


i=0 


This proves the first part of the theorem. 


EXAMPLE 3.6.1 Compute 7 (250) and o(250). 
Now 
7(250) = r(2- 5°) = 7(2)7 (5°) = 2-4=8. 


Hence 250 has 8 positive divisors namely 1, 2, 5, 57,53, 2-5,2-57,2-57. Next 


27-—154-1 
a (250) = er ee (3)(156) = 468. 


An extremely important arithmetic function is the Mobius function that we intro- 
duced in Section3.3 and used in the proof of Dirchlet’s theorem. Recall that the 
Mo6bius function is defined for natural numbers n by 


1 ifn =1 
b(n) = 4(-1)" ifn = pip2--- p, with p),..., p, distinct primes 


0 otherwise. 
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Lemma 3.6.2. The Mobius function p(n) is multiplicative. 


Proof Suppose that (n,m) = 1. If either n or m is not squarefree then mn is not 
squarefree. Hence in this case pu(mn) = 0 and either j4(m) = 0 or ps(n) = O so that 


umn) = p(n) (m). 
Hence we may assume that both n and m are squarefree. Assume 
n= Pi--: Pr andm = q1---G 


with each having distinct sets of prime factors. Then p(n) = (—1)* and p(n) = 
(—1)'. Since the sets of prime factors are disjoint the prime decomposition for nm is 


nm = Pi-++ Pegi-** - 


Therefore 
(nm) = (-1)**" = (-1)*(-1)' = wp) (mn). 


Using the multiplicativity we obtain the following theorem. 
Theorem 3.6.3 For the Mobius function |1(n), 
1 ifn=1 
Yu@=t) # 
an 0 ifn>1. 


Proof Clearly ifn = 1 


YS u@ = 1. 


d\n 


Since ju(7) is multiplicative from Theorem 3.6.1 


F(n) = >) u@) 


d\n 


is also multiplicative. Therefore we need only prove the result for prime powers. 


Letn = p* withe > 0. Then the positive divisors of n are 1, p,..., p® and hence 
DHA) = Doar’). 
d\n i=l 


However pi(p') = Oif i > 1 and so 
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D> H@) = WA) + up) = 1+ (-1) = 0 
d\n 


completing the proof. 


This result allows us to prove the following very important theorem which has 
far-ranging applications. 


Theorem 3.6.4 (Mébius Inversion Formula) Suppose that f(n) is an arithmetic 
function and 


F(n) =>) f@). 


d\n 


Then , 
Fn) =D wa) F(3). 


d\n 


Conversely if F(n) is an arithmetic function and 


fn) = uaF) 


d\n 


then 


Fan) =>) f@. 


d\n 


Proof Consider 


DLe@FG) => fo 


d\n d\n ka 


= u@ fF. 


dk\|n 


This last sum is taken over all ordered pairs (d, k) with dk|n. This is symmetric in 
(d, k) so we can reverse the roles of d and k to obtain 


DL e@FG) => fH Vu. 


d\n k\|n d\t 


From Theorem 3.6.3 i 
>) #@) = 0 unless gah 
dt 


This would imply that k = n and hence the sum on the right hand side reduces to 
Ff (n), completing the first part. 
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Retracing the steps exactly in the opposite direction will prove the converse (see 
the exercises). O 


The Mobius inversion formula is a special case of an inversion formula in math- 
ematics. These arise in many different areas. An important continuous example is 
the Fourier Inversion Theorem. Suppose that f(x) is an integrable function over 
the whole real line. Its Fourier transform is defined as the complex valued function 
given by 


Pairs / Fe ™du, 


Then 


Theorem 3.6.5 (The Fourier Inversion Theorem) If f (x) is an integrable function 


A 
and f (w) is its Fourier transform then 


1 MOK 2 
fx) == / f(wye"™ du. 
T J —oo 


This inversion theorem is used in the solution of partial differential equations and 
also can be used in a proof of the famous central limit theorem from Mathematical 
Statistics (see [Gr]). The Fourier transform is an example of an integral transform. 
We will see and use another such transform, the Mellin transform, in the proof of the 
prime number theorem. 


3.7. Exercises 


3.1 Show that for any real number x with 0 < x < 1 we have 


1 


oe) n lo) 
xX it xX 
= ; — <= = 
1-x n 
n=1 


1-x 
n=1 


In( 


(Hint: For the first part consider the Taylor series for In(1 — x). Start with the sum 
of a geometric series = =1+x-+.x?+--- and integrate.) 

3.2 Show that the Fermat numbers F|, F2, F3 are all prime but that F4 is composite 
( divisible by 641). 


3.3 Prove that suppose (a,,) is any sequence of integers with (a, @,) = lifn 4m. 
Then there exist infinitely many primes. 


3.4 If A, =a” +1 then prove: 
(a) Ifn > m > 1 then (A,,|(A, — 2), 
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(b) (An, Am) = Lifn 4m and a is even, 
(b) (Ayn, Am) = 2 ifn 4 m and a is odd. 


3.5 Determine using the same types of methods used to find the value of the golden 


section the value of 
(4 


3.6 Recall from Section 3.2.5 that a continued fraction is defined in the following 
way: 

Let do, d1,--- , dy, bea finite sequence of integers all positive except possibly apo. 
Then a finite simple continued fraction is the rational number defined by 


If do, a1,...,@n,... iS an infinite sequence of integers all positive except possibly 
do, then a infinite simple continued fraction is determined by the limit of the finite 
simple continued fractions formed up to a,. Each of the finite simple continued 
fractions is called a convergent of the infinite simple continued fraction. 

Find the values of the following infinite continued fractions. 

(a) a, = 3 for all n. 

(b) (a,) = C1, 2, 1,2, 1,2,...). 


3.7 Prove Lemmas 3.1.6 and 3.1.7, that is prove: 
(a) fn fai = ie oF bE en bmn n>. 
(b) fe — fa-ifnyi = (-D",n > 1. 


where f, are the Fibonacci numbers. 


3.8 Prove Lemma 3.1.8, that is, prove 


Sn+m = Tatlin + Snfm+i> = I; 


where f,, are the Fibonacci numbers. 


3.9 Prove 

(a) p| fp+1 if p = +3 mod 10 with p prime 

(b) p| fp-1 if p = +1 mod 10 with p prime 

(Hint: Use the identities in the proof of Theorem 3.1.10.) 


3.10 The real Chebyshev polynomials of the second kind can be defined by 


So(x) = 0, Si(x) = 1, Sny1(%) = ¥Sp(%) — Sn-1) 
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Prove that 
(a)Ifx >2, x =2cosé@ < 2 then 


sin(n@) 
Sn (x) = A . 
sin 0 
(b) Ifx >2, x =2cosh@ > 2 then 
sinh(n)@) 
S, (x) = ———_. 
(x) sinh 0 
(c) If x = 2 then 
Sn(x) =n. 


(Hint: Use induction and trigonometric identities.) 
3.11 Prove directly that there exists infinitely many primes of the form 8 + 3. 


3.12 Classify the pythagorean triples where the hypotenuse differs by one from 
one of the legs. 


3.13 Show that given integers x9, with x9 = —1 mod n then there exist integers 
y, b with (y, b) = 1,0 < b < ./nand 


3.14 Show that the number of representations of m > 1 as a sum 
m =a’ +b’ with (a, b) = 1 is equal to the number of solutions of 


x? =—1modm. 


3.15 Determine the set of integers represented by the quadratic forms 
(a) f(x, y) = 2x? + 2y? 
(b) f(x, y) = 2x? — 2y? 


3.16 Show that a projective matrix (see Section3.2.3) X € PSL(2, Z) has order 
2 if and only if its trace is zero. 


3.17 If G is any group, its center, denoted by Z(G) consists of those elements of 
G which commute with all elements of G; 


Z(G) = {g € G; gh = hg, Vh € G}. 
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Prove that Z(G) is a normal subgroup of G. 


3.18 Prove parts (1) and (2) of Lemma3.3.5. That is prove that 


(a) If x; and x2 are characters then so is v1; x2 where (x1X2)(@) = x1(a)x2(a). 
(b) If v is a character so is its complex conjugate x. Further y(a)~! = x(a). 


3.19 Prove that if a is an odd integer and ¢ > 2 then 
a = (—1)7 5S? mod 2! for some b > 0 


(Hint: Separate into two cases where a = | mod 4 and a = 3 mod 4. Then use the 
facts that 5° represents exactly 2’? numbers incongruent mod 2! and that 5? is 
periodic mod 2' with period 2'~?.) 


3.20 Fill in the details of the proof of the second part of Theorem3.3.2. That is 
prove that if a > 0 is an integer and y runs over the set of all @(k) characters then 


—__[d(k) ifa=t modk 
2D XOX@) . f ifa £1 mod k. 


3.21 Consider the van Mangoldt function A(n) defined for positive integers by 


A@alme fa=poezt 
0 for all other n > 0 


Prove that 
SI A®@ = Inn. 


d\n 


3.22 Let y be a real character mod k and define f(n) = >- d\n X(d). Prove that 
f(n) => 0 for alln > 1 and f(n) > lifn = (oa a square. 


3.23 Prove Lemma 3.4.1, that is, prove: The integer 5 is the only prime appearing 
in two different twin prime pairs. 


3.24 Prove Lemma3.4.2, that is, prove: There is a one-to-one correspondence 
between twin prime pairs and those integers n > 4 for which n? — 1 has only four 
possible positive divisors. 


3.25 Prove Lemma 3.4.3, that is prove: Suppose p, g are primes. Then pq + | is 
a square if and only if p and g are twin primes. 
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3.26 Prove Lemma 3.4.4, that is prove: If p, q are twin primes greater than 3 then 
p+q is divisible by 12. 


3.27 Prove that the divsior functions T(n), a(n), ox (nm) are all multiplicative. (Fill 
in the details of the proof of Lemma 3.6.1.) 


3.28 Prove that if 7(7) is the sum of the positive divisors of n andn = pj! --- p;' 


then 
eyt+l 


etl etl 

2) | 

a(n) = (44 
Pi 


= p> Px 
| )( ol oe as 


(see Theorem 3.6.2.) 
3.29 Compute 7 (7) and o(n) for n = 105, 72, 788. 


3.30 Prove that if /'(7) is an arithmetic function and 


fin) = Yo Md) FG) 


d\n 
then 


Fan)=>°f@. 


d\n 


3.31 Prove that for real numbers x, y with 0 < x < 1 we have the inequality 


d=—£7 ll =xe" (71 =2e*" > = 1. 


3.32 Suppose that f(m) and g(n) are multiplicative arithmetic functions. Show 
that F(n) = f(n)g(n) is also multiplicative. 


3.33 Show that a natural number p is a prime if and only if o(p) = p + 1. 


3.34 Use the multiplicativity to derive a formula for o,(n) the sum of the kth 
powers of the positive divisors of n. 


3.35 Prove Theorem 3.2.4 by using the MGbius inversion formula. (Hint: First 
prove part (3) directly.) A group theoretic proof is in [KR 2]. 


Chapter 4 
The Density of Primes 


4.1 The Prime Number Theorem—Estimates and History 


As we have seen, and proved in many different ways, there are infinitely many 
primes. In fact, as Dirichlet’s Theorem shows, there are infinitely many primes in any 
arithmetic progression an + b with (a, b) = 1. However, an examination of the list 
of positive integers shows that the primes become scarcer as the integers increase. 
This statement was quantified in Theorem 2.3.2, where we proved that there are 
arbitrarily large spaces or gaps within the sequence of primes. As a result of these 
observations the question arises concerning the distribution or density of the primes. 
The interest centers here on the prime number function 7(x) defined for positive 
integers x by 
m(x) = number of primes < x. 


Clearly 7(x) — oo as x — o so the appropriate question on the distribution of 
primes is what is the rate of growth of this function. The Prime Number Theorem 
asserts that asymptotically 7(x) is given by ;~. Asymptotically means as x goes to 
oo. It has been touted as one of the most surprising results in mathematics given that it 
ties together the primes and the natural logarithm function in a simple way that is most 
unexpected. The proof of the prime number theorem, or more precisely the attempted 
proof by Riemann, is really considered the beginnings of modern analytic number 
theory. This refers to the use of analytic methods, especially complex analysis, in 
the study of number theory. However, as we saw relative to Dirichlet’s theorem, the 
use of hard analysis actually precedes Riemann’s work. 

The prime number theorem was originally conjectured by both Gauss and 
Legendre although Euler also surmised the result. Gauss looked at the list of primes 
less than 3,000,000 and noticed that the prime number function is given very closely 
by the function Li(x) which is defined by the integral 


x 1 
Li(x) = in 
2 Int 
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Gauss’ observation was then that 
W(x) & Li(x). 


If integration by parts is used on the integral Cen E Li (x) and we take the limit as 
xX — ov itis clear that this integral is asymptotically =. Hence, Gauss’s observation 
is then that (see Definition 4.2.2) 


W(x) 


im — 
x>o0 x/Inx 


This is the prime number theorem which we now state formally. 


Theorem 4.1.1 (Prime Number Theorem) If 1(x) is the prime number function then 


W(x) 


1m => 
X00 x/ Inx 


Legendre, (actually published a bit earlier than Gauss), by looking at the list of 
primes up to 1,000,000 came up with a slightly different formula: 


x 


T(x) & ————____.. 
Inx — 1.08366 

Legendre’s estimate is also asymptotically ;~. Neither Gauss nor Legendre gave a 
proof of the prime number theorem nor an indication of how they arrived at their 
estimates. However in hindsight a possible explanation is as follows. Looking at 


z 


tables of 7(10”) it is observed that as n changes by 2 the ratio => ieee by 


~ 


an almost constant amount 4.6 which is 2 1n(10). This would es that ay 2 


In(10”). The figures are as below 


x 10% 10% 10° 108 10!° 10!2 
m(x) 25 1229 78498 5761455 455052511 37607912018 
*_ 4.000 8.137 12.739 17.357 21.975 26.590 


T(x) 
In(x) 4.605 9.210 13.816 18.421 23.026 27.361 
a 1.151 1.132 1.085 1.061 1.048 1.039 


The first real attempt to prove the prime number theorem was done by Chebyshev 
in 1848. He proved that there exist constants A; and Az with .922 < A; < | and 
1 < A> < 1.105 such that 

m(x) 


<A 
x/In(x) 


is 


Further he proved that if a - had a limit it would have to be 1. However, he could 
not prove that the function in the middle actually tends to a limit. In proving this 
result Chebyshev used the Riemann zeta function 
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=> 


n=1 


where s > 1 is areal variable. This function was introduced originally by Euler who 
used it to give a proof of the infinitude of primes (see Section 3.1.2). This was really 
the first use of analysis in number theory. 

Chebyshev’s inequality has been improved upon many times. Sylvester in 1882 
improved it to Ay = .95695 and Az = 1.04423 for sufficiently large x. It can now 
be shown that for all x > 10, A; = 1 can be used. 

In 1859, Riemann attempted to give a complete proof of the prime number theorem 
using the zeta function for complex variables s. Although he was not successful 
in proving the prime number theorem he established many properties of the zeta 
function and showed that the prime number theorem depended on the zeros of the 
zeta function. He conjectured that all the zeros of ¢(s) in the strip 0 < Re(s) < 1 
are along the line Re(s) = 5. This is known as the Riemann hypothesis and is still 
an open problem. We will discuss both the Riemann zeta function and the Riemann 
hypothesis is Section 4.4. In 1896, building on the work of Riemann, Hadamard, 
and independently C. de la Vallee Poussin proved the prime number theorem. Their 
proof relied heavily on complex analysis. It was felt for a long time that the prime 
number theorem was at least as complicated as the theory of complex variables. 
Most mathematicians doubted that a proof which did not heavily rely on the theory 
of analytic functions could be found. However in 1949, Selberg and later Erdos came 
up with an elementary proof of the prime number theorem. This proof is actually 
harder than the analytic proof but is elementary in that it does not use any complex 
analysis. 

Although the proof of the prime number theorem is really considered the begin- 
nings of analytic number theory we have seen that the use of analysis in proving 
results in number theory was done earlier. Euler introduced the zeta function in giv- 
ing a proof that there are infinitely many primes. We presented this proof in Chapter 3. 
In his proof though the analysis was relatively easy. The first hard use of analysis 
was used by Dirichlet to prove Dirichlet’s theorem. As we exhibited in Chapter 3, 
there are many special cases of this result that can be proved by very elementary 
methods. However no proof of the complete result is known without analysis. 

Given that the prime number theorem has been established many other questions 
concerning it can be raised. First of all notice that if a is any constant then 

x 


x : 
— & if x were large. 
Inx Inx-a 


Hence the prime number theorem is equivalent to 


ic ae 
x>oo x/Inx —a 
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for any constant a. The question arises as to whether there is an optimal value for a. 
Empirical evidence is that a = | is an optimal choice and generally better for large 


x than Legendre’s 1.08366 and better than Gauss’ Li(x). The table below compares 
the estimates. 


x - x ee 
xX (x) ing Li(x) yyc1083e nani 


10° 168 145 178 172 169 
10* 1229 1086 1246 1231 1218 
10° 9592 = 8686 =: 9630 9588 9512 
10° 78498 72382 78628 78534 78030 
10’ 664579 620420 664918 665138 661459 
108 5761455 5428681 5762209 5769341 5740304 


Observing the table above it is noticed that Li(x) > (x). The question arises 
as to whether this is always true. Littlewood in 1914 [Li] proved that 7(x) — Li(x) 
assumes both positive and negative values infinitely often. Te Riele in 1986 [Re] 
showed that there are greater than 10!8° consecutive integers for which r(x) > Li(x) 
in the range 6.62 x 10°”? < x < 6.69 x 10°”. 

The prime number function 7(x) and the prime number theorem answer the basic 
questions concerning the density of primes. A related question concerns the function 


P(1) = Pn 
where p, is the nth prime. That is the question of whether there is a closed form 


function which estimates the nth prime. The answer to this is yes and turns out to be 
equivalent to the prime number theorem. We state it below. 


Theorem 4.1.2 The nth prime p, is given asymptotically by 
Pn © n inn. 


Proof From the prime number theorem we have that 7(x) © =~. Let 


y= a => Iny = Inx —InInx. 
Inx 


But In In x is asymptotically small compared to In x and hence 
Iny © Inx. 


Now 
x=ylnxSylny. 
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This shows that the inverse function to >= is asmptotically x In x. But by the prime 
number theorem this is asmptotically the inverse function of 7(x). O 


Notice that if we had started with Theorem 4.1.2 we could have recovered the 
prime number theorem. 


4.2 Chebyshev’s Estimate and Some Consequences 


The first significant progress in developing a proof of the prime number theorem was 
obtained by Chebyshev in 1848. He proved that the functions 7(x) and =~ are of 
the same order of magnitude, a concept we will explain in detail below and that if 
limy- 0 oie existed then the limit would have to be 1. At first glance it appeared that 
he was quite close to a proof of the prime number theorem. However, it would take 
another 50 years and the development of some completely new ideas from complex 
analysis to actually accomplish this. A proof, along the lines of Chebyshev’s methods, 
without recourse to complex analysis, would not be done until the work of Selberg 
and Erdos in the late 1940’s (see [N]). 


Chebyshev proved the following result, now known as Chebyshev’s estimate. 


Theorem 4.2.1 There exist positive constants A, and Az such that 


x x 
Ai—— < m(x) < Ap>—— 
Inx Inx 


for all x > 2. 


The proof we will give is somewhat simpler than that of Chebyshev. The constants 
we arrive at in the proof given below are sufficient but nowhere near best possible. 
We will say more about this at the conclusion of the proof. 

The proof depends on some properties and inequalities involving the binomial 
coefficients (1) . We have used these numbers in several instances in previous sections 
but here we begin by formally defining them and then reviewing some of their basic 
properties. 


Definition 4.2.1 Given nonnegative integers n, k withn > 1 andn > k the binomial 
coefficient (7) is defined as 
n n! 
(") ~ kin —k)! 


Note that by convention 0! = 1. 


The first several results outline standard properties of the binomial coefficients 
and proofs can be found in any book on probability and statistics. We also outline 
proofs in the exercises. 
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Lemma 4.2.1 (/) represents the number of ways of choosing k objects out of n 
without replacement and without order. 


Clearly the number of ways of choosing k objects out of n objects also counts the 
number of possible subsets of size k in a finite set with n elements. 


Corollary 4.2.1 (i) = the number of subsets of size k in a finite set with n elements. 


Lemma 4.2.2 (The Binomial Theorem) For any real numbers a, b, and natural num- 


bern we have 
n 
(a + b)" = ( Jaton 


Letting a = b = | in the Binomial Theorem we get 


Corollary 4.2.2 (1 +1)" = 2” = 7°. (j). In particular (/) < 2” for all k with 
O<k<n. 


Combining Corollaries 4.2.1 and 4.2.2 we obtain the well-known result that the 
number of subsets of a set with n element is 2”. Consider a set with n elements. Then 


total number of subsets = 


number of subsets of size 0 + --- + number of subsets of size n 


(eG Q-2 0-7 


Lemma 4.2.3 (‘) + (,”,) = ("f'). 


This last lemma is the basis of Pascal’s Triangle in which each row consists of 
the set of binomial coefficients for that numbered row. 


Each subsequent row is formed by placing a one on the outside and each subse- 
quent number is placed between 2 numbers in the previous row and is their sum. For 
example, 

1 3 3 1 
14 6 4 1 
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Since 
14+3=4,34+3=6,34+1=4. 


The final standard idea we will need is that of Stirling’s approximation (see 
Section 3.1.6) 


ni & 4/2mn(~)". 
(a 


For Chebyshev’s estimate we need the following results which are deeper and use 
number theory. 7(7) in the lemma is the prime number function. 


Lemma 4.2.4 (i) n7™@-7 < 7") < (2n)7”, 
Wea) a7" 


Proof If p is a prime let e, be the highest power such that p*|n!. Then by an easy 


induction we have 


ty 


i=1 


where [_ ] is the greatest integer function and f, is the first integer such that pt! > n. 
Clearly such a t, exists for each prime p. Now consider 


2n\ — (2n)!_— (2n)Qn—1)---(n +1) pnt i 
C)-S n! = J ) 


Given a prime p, let, be the highest power such that p’” | (") . From the observation 
above 


where here k,, is the first integer such that pret! > Qn. 
If 1 <i <k, then 


Since [4] and 2[5] are integers it follows that 


2n n 
[—] -— 2[— 
p' p' 


if 1 <i < k,. This then implies that 
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Therefore 


and hence 


2n ky = m(2n) 
(") < J] e’ < [en =e) 


p<2n p<2n 


giving one side of the first inequality. 
On the other hand ifn < p < 2n then p|(2n)! but p does not divide n!. It follows 


that , ; 
nN nN 
1o(%) = the) 
n<p<2n n<p<2n 


Now 


I] p> I] nm nt2n-nn) 


n<p<2n n<p<2n 


since there are 7(2n) — 7(n) primes in the range p <n < 2n. Therefore 


2n 
nt nnn) < 
n 


establishing the other side of the first inequality. 
For the second inequality we have 


2 
( 4 <(1 41)" =2™ 
n 


and from above 
n 


C) fetter 
j=l 


j=l 


@) 5 
Dig < < 2 An 
n 


establishing the second inequality. 


Therefore 
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We now give the proof of Chebyhsev’s estimate. 


Proof (Theorem 4.2.1) We have to show that there exist positive constants A; and 


A» such that 
x x 
Ai—— < m(x) < Ao>— 
Inx In x 


for all x > 2. 
From the previous lemma we have the inequalities 


nt en—mn) < () < (2n)"@™) 
nj— 


() 7 
Qn < 23% a 
n 


Hence 
nteny—m) < 978 > (n(2n) — x(n)) Inn < 2nIn2 
2n In2 
==> 7(2n)—7(n) < 
Inn 
On the other hand 
(2n)™@) > Qn — m(2n) ~ nin2 
> ~ In(2n) 


For a real variable x > 2 let 2n be the greatest even integer not exceeding x, so 
that x > 2n,n > 1 andx < 2n +2. Then 


nIn2 2 nIn2 
In(2n) ~ Inx 


™(x) = n(2n) > 


(Qn+2)In2 In2 x 
> > : 
= 4inx 4 Inx 


Therefore 


x 
(x) > Ay—— 
In x 


for all x > 2 with Ay = “2. 


To find the existence of A> let 2n = 2' with t > 3. Then 


, a 2' In2 2! 
n(2') — r(2'!) < = 
G=Dih2 #21 


Consider the telescoping sum 
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2j 
Da’) = 72!) = 127) — 74). 


Since 7(4) < 4= — and 7(2') — r(2'~!) < fant we obtain using the telescoping 
sum that 
3 2j 2! J ot 
92 
7 ) 7 > f= 1 => f= 
t=2 t=2 t= rig 
Now 


1=2 1=2 
and os " 
a t J 
ar rr a 
t=j+l aT aa 


It follows that 


m(27/) Ze Qit1 + Liat, 
J 


. * ; ° 2j+1 
Since j < 2/ we have 2/*! < a 


and therefore for j > 2 


2it1 


n(274) < a2 . 


This implies that 
n(2/) 
27/ 


4 ; 
< — forall j > 2. 
J 


Let x > 2 be areal variable. Then there exists an integer j > 1 such that 27/-? < 
x < 27/, Hence 
T(x) = m(27/) 4 (27/) 


x ~ 22/-2 22) 
Further 
_. Inx 4 8ln2 
aie 
In2 J Inx 


n(274) 4 mx) 16  321n2 
yo Ss <-s 
274 J x J Inx 
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x 
= > r(x) < (321In2)—— 
Inx 


for all x > 2. Therefore 


x 
W(x) < Ay— 
Inx 


for all x > 2 with Az = 321n2 establishing Chebyshev’s estimates. 


We mention again that the proof is somewhat simpler than that originally given by 
Chebyshev and arrives at weaker constants. We obtained A; = m2 and Az = 321n2 
which were sufficient for the theorem but nowhere near best possible. Chebyshev 
showed that A; = .922 and Az = 1.105 could be used. His proof actually involved 
a careful analysis of a form of Stirling’s approximation. The values in the constants 
in Chebyshev’s inequality have been improved upon many times. Sylvester in 1882 
improved the values to Ay = .95695 and Az = 1.04423 for sufficiently large x. It 
can now be shown that for all x > 10, Aj = 1 can be used. 

This following is an immediate corollary of the estimate, independent of the values 
of A; and Ao. 


Corollary 4.2.3 7 — 0 as x > ov. 


Proof From Chebyshev’s estimate we have 


x T(x) A2 
0 < m(x) < Aa— = 0 «< — «< —. 
Inx x Inx 


Ad 
Inx 


Since A» is a constant — Oas x — o soclearly 70) — Oalso. 


This corollary says that the primes become relatively scarcer as x gets larger. In 
probabilistic terms it says that the probability of randomly choosing a prime less than 
or equal to x goes to zero as x goes to infinity. 

Before continuing and presenting some consequences of Chebyshev’s result we 
introduce a convenient notation for describing the order of magnitude of a function. 


Definition 4.2.2 Suppose f (x), g(x) are positive real-valued functions. Then 


(1) f(x) = O(g(a)) (read f (x) is big O of g(x)) if there exists a constant A inde- 
pendent of x and an xo such that 


f(x) < Ag(x) for all x = xo. 
(2) f(x) = o(g(2)) (read f (x) is little o of g(x)) if 


f(x) 
g(x) 


> 0asx> ow. 


In other words g(x) is of a higher order of magnitude than f (x). 
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(3) If f(x) = O(g(x)) and g(x) = O(f (x)), that is, there exist constants A,, Az 
independent of x and an xo such that 


Aig(x) < f(*) < A2g(x) for all x = xo, 
then we say that f (x) and g(x) are of the same order of magnitude and write 
f(x) & g(). 
(4) If 


then we say that f (x) and g(x) are asymptotically equal and we write 
f(x) ~ g(x). 


In general, we write O(g) or o(g) to signify an unspecified function f such 
that f = O(g) or f = o(g). Hence, for example, writing f = g + o(x) means that 
fs — 0 and saying that f is o0(1) means that f(x) — Oas x — oo. 

It is clear that being o(g) implies being O(g) but not necessarily the other way 
around. Further it is easy to see that 


f ~ gis equivalent to f =g+o0(g) =gU+o()). 


In terms of the notation above Chebyshev’s estimate can be expressed as 


x 
T(x) & —. 
Inx 


Further the prime number theorem can be expressed by 


x 
n(x) ~ —— 
Inx 


or equivalently 
x 
n(x) = ——(1 + o(])). 
Inx 


We will use this notation freely as we develop the proof of the prime number theorem. 

We now present some consequences of Chebyshev’s estimate. It was mentioned 
at the end of the previous section that the prime number theorem is equivalent to 
Pn ~ ninn, where p, denotes the nth prime (Theorem 4.1.2). Chebyshev’s estimate 
gives immediately that p, and n Inn are of the same order of magnitude. 
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Theorem 4.2.2 There exist positive constants B,, By such that 
Byninn < py, < Boninn. 


Equivalently 
Pn & nInn. 


Proof Let p, be the nth prime. Then clearly 7(p,,) = n. From Chebyshev’s estimate 


n= (Py) < Ar 
In 


for all n > 2. 


This implies 
—n In py < Pp foralln > 2. 
Ao 


However, p, > 1 so 


1 
—nIlnn < —nInp, < p, foralln > 2. 
2 Az 


Therefore In general we write 
Bininn < py 


for all n > 2 with By = +. 


In the other direction, we have 


P 
i= T (Pn) Za Aj : 
In Pn 
Since p, > n it follows that ne — Oasn — o. Therefore, there exists a constant 
k such that , 
n 
Pu — Ayifn>k. 
Pn 
Hence 1 i 
n n 
Qe Ae ek 


Pn Vv Pn 


It follows thatn > ./p, and soln p, < 2Innifn > k. Let 


P2 P3 Pk-1 


A,’ 2In2’ 3In3’’ (K—1) Ink — pt 


By = max{ 


Then 
Pn < Bon Inn for alln > 2. 
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Note that we could have proved Theorem 4.2.2 and then deduced Chebyshev’s 
estimate from it. This result also pout a very simple proof of Euler’s Theorem 
given in Chapter 3 that the series a ;, diverges. 


Corollary 4.2.4 The sum 


x= 


P.prime 

diverges. 

Proof — n > 2 we have = Soe from the last theorem. However the series 
o.) 

Se — diverges by the integral test. 


Although there are infinitely many primes and >” . + diverges it still diverges 
very slowly. Using the methods applied in the proof of Chebyshev’s estimate we can 
actually bound the growth of the series of reciprocals of the primes. 


Theorem 4.2.3 There exists a constant k such that 


1 
>) = <Aklninx ifx > 3. 


2<p<x 
Proof From Theorem 4.2.2 we have 


Pn = Bin inn. 


Therefore 
1 T(x) l T(x) 1 1 [x] 1 
ee 
ae, Pp Sir FS Bininn B, ninn 
However 
1 a dt [ dt 
= < —— 
ninn n-) Ninn ~ J, tlnt 
since — < + on[n — 1, n] ifn > 3. Then 


[x] [x] 


1o4 1 
2 oe) Tee eh Mere 


1 1 * dt 1 1 
< + = + In In x — — InIn2 
2B, |In2 By Jo tint 2B, \n2 By B, 
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1 
= —InInx+C <klInIinx 
By 


taking k large enough. 


In a similar vein we get the following result which bounds the product of all the 
primes p less than some given x. 


Theorem 4.2.4 [fx > 2 then |[,<, p <4. 
Proof The theorem is clear for 2 < x < 3. Suppose the theorem is true for an odd 


integer n with n > 3. Then it is true forn < x <n+2 since 


[[e=[]? <4 <4. 


psx psn 


Therefore it is sufficient to prove the theorem for odd integers n. We do an induction 


on the odd integers. The theorem is true for n = 3 and so we assume that it is true for 
all odd integers less than or equal ton > 5. Letk = ait ork = at chosen so that 


k is also odd. Then k > 3 and n — k is even. Furthern —k = 2kK41—k <k-+1. 
If p is a prime with k < p <n then p|n! but p does not divide either k! or (n — k)!. 


Therefore p| (7) = Tae It follows that the product of all such primes divides () 
and hence 
n-() 
k<p<n k 
Since (7) = (,,",) and both are in the binomial expansion of (1 + 1)” it follows that 


(/) < 2""!. Therefore using that k < n and the inductive hypothesis 


psn pk = k<p<n 


Finally based on many of these estimates we can provide a proof of Bertrand’s 
Theorem (actually proved by Chebyshev) which we introduced in the last chapter. 
Recall that this theorem says that given any natural number n there is always a prime 
between n and 2n. The proof actually shows that given any real number x > 1 there 
exists a prime between x and 2x. 


Theorem 4.2.5 (Bertrand’s Theorem) For every natural number n > | there is a 
prime p such thatn < p < 2n. 


Proof By direct computation the theorem is easily established for n < 128. Now 
suppose that for some n > 128 there is no prime between n and 2n. For a prime p let 
m, be the highest power of p dividing (") and k,, the first power such that pet! > 2n 
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as in the proof of Chebyshev’s Estimate. Then as in the proof of Chebyshev’s estimate, 
since we assume no primes in the range n to 2n we have 


(*") _ I] p™ = [[e”.™ < Rie 


p<2n psn 


Here we use [x] to indicate the greatest integer function, that is, the greatest integer 
less than or equal to x. 


Now if 2 < p <nwe then have p > 3 and2 < a < 3 and therefore 


2n n 
m, =([—]-2[—]=2-2=0. 
Pp Pp 


If /2n < p< 2 then we have p- > 2n and hence k, = 1 andsom, < 1. 
Finally if p < /2n we have p’» < p*» < 2n. Therefore 


() = I] p™ I] p™ I] p™ 


p<V2n V2n<p<*  <p<n 
< [Je |] 
p<V2n Vin<p<*t 


For a real number x > 128 we have z(x) < os since there are at most —1 odd 


integers less than x so certainly no more than that primes. Further since x > 128 we 


have at least two odd non primes less than x so 7(x) < a -2< 5 — 1. It follows 


that 7(/2n) < Je — | and hence 


I] 2< env? 
psV2n 


Further from Theorem 4.2.4 we have 


Tp <4 
p<4>. 
ps 


Therefore 


2. i n 
( 4 < (2n)V 2-147 
n 


Now 


sy on 2n 2n 2n 
27 =14+)"=1+4+ i free Hp +1. 
1 n 2n—1 
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2n 
n 


outside terms (1 + 1 = 2) we have 2n terms each < (") and therefore 


2n (") () —142n 
22" < (2n) = > (2n)~!22", 
n n 


Combining these two inequalities gives 


There are 2n + | terms in this expansion and ( ) is the largest. Combining the two 


(2n)122" < (QnyV 2-143 => 27 < nyv2. 


Taking logarithms then yields 


2 
oo < [Zinn => V8nin2 —31n(Q2n) < 0. 


We show that this is a contradiction for all n > 128. 
Let F(x) = V8x In2 — 31n(2x). Then F(128) = 81n2 > 0. Further 


a V2/f% — 3/In(2) 
a ; : 


This last expression is positive for x > 128 and hence F(x) is an increasing function 
for x > 128. Since F(128) > 0 it follows that F(x) > 0 for all x > 128. Therefore 


SHigD fil 2 
ae > 5 nee) 


V8n In2 — 31n(2n) > 0 


which implies that 


for n > 128. This is impossible and hence a contradiction. Therefore there must be 
a prime between n and 2n for any integer n. 


4.3 Equivalent Formulations of the Prime Number 
Theorem 


The proof of the prime number theorem rests on the analysis of three additional 
functions besides the prime number function 7(x). The first and most important of 
these is the Riemann zeta function ¢(s). As was discussed in the previous chapter 
this function was introduced for real s > 1 by Euler in proving that there are infinitely 
many primes (see Section 3.3). The function was then modified by Dirichlet and used 
in proving that there are infinitely many primes of the form an + b with (a, b) = 1. 
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Riemann extended the definition to allow the variable s to be complex and showed 
how knowledge of the location of the zeros of the now complex function ¢(s) in 
the complex plane would imply the prime number theorem. We will discuss the zeta 
function and describe its ties to the prime number theorem in the next section. The 
other two functions that must be analyzed are known as the Chebyshev functions. 
The first, denoted @(x), is defined for a real variable x by 


O(x) = > hn p with p prime (4.3.1) 


psx 


while the second, denoted (x), is defined, again for a real variable x, by 


vx) = >- inp with p prime (4.3.2) 


pe <x,k>1 
These functions count respectively the number of primes p < x and the number 
of prime powers p* < x weighted by In p. Recall that the van Mangoldt function 
A(n) is defined for positive integers by 


Inp ifn=p*,c>1 
A(n) = 
0 for all other n > 0. 


Hence the Chebyshev function 7(x) is actually the summation function of A(). 
That is 


bx) = >) AC). 


n<x 


Further for a given prime p < x the number of times In p is counted in the sum 


for (x) is fea Hence 7(x) can also be expressed as 


1 
vO) = DL 1n p 
pax PP 


In the type of notation we have used in defining the Chebyshev functions the 
prime number function can be expressed as 


a= > 41 (4.3.3) 


P<x,p prime 


There are certain immediate relationships between these three functions. First, if 
p* <x then p < x so clearly 


A(x) = P(e). 


Further since | < In p for p > 3 we have 
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a(x) < O(x) forx > 5. 


Now if p* < x thenk < [in 1. It follows that 


W(x) = > Inp= > DInp 


pk <x, PSX pkex, 
"sl k>1 


| 
— Din! Inp< > hes = 7(x) Inx. 


psx 


Therefore 
w(x) < w(x) Inx. 


Now @(x) = >) ,<, Inp = In([],,<, p). However from Theorem 4.2.4 we have 
It p<x P < 4". Therefore 
A(x) < x(In4) 


and consequently 
A(x) = O(x). 


We will need the following lemma which says that relative to x, @(x), and w(x) 
have the same order of magnitude. 
Lemma 4.3.1 w(x) = 0(x) + O(n? (In x)) 


Proof By definition 
ve) = Di np. 


pk <x, 
k>1 


For a given prime p < x let p‘ be the highest power of p such that p’ < x. Then 


1 


1 il 
PEEP SF. P Se => Psx,psx?,..., penx. 


It follows that ; . 
w(x) = O(x) + O(x?) ++ + Oe) 


where m is the first integer such that m+ 1 > ne, We have 


(x)= >Inp < > Inx <xInx if x > 2. 


psx psx 


It follows that ; ; ; 
O(xk&) < xe Inx < x2 Inxifx >2andk > 2. 
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In the sum 


>" (xt) 
k=2 


there are O(In x) terms since m — | < ns. This coupled with the fact that O(xt) < 


1 . 
x2 In x gives that 
m 


S" (xt) = OF (In.x)’). 


k=2 


Therefore 
w(x) = O(x) + O(x? (Inx)”). 


It follows immediately from this lemma and the fact that x2(In x)? = 0(x) that 
if there exists a constant A with 0(x) < Ax then there exists a constant B such that 
w(x) < Bx and if there exists a constant C with Cx < y(x) then there exists a 
constant D with Dx < @(x). 

We extend these observations to show that 0(x) and (x) both have order of 
magnitude x. 


Theorem 4.3.1 There exist positive constants A,, Az, B,, Bz such that 
Ax < O(x) < Aox, 
Bix < W(x) < Box. 


Thats is, 0(x) © x and (x) & x. 


Proof In light of the comments made preceding the theorem it suffices to bound 
6(x) above and w(x) below. From Theorem 4.2.4 we have that [] sa P< 4*. This 
implies that 0(x) = >°,., Inp < x1n4 and hence @(x) < Bx with B = In4. This 
bounds 6(x) above. 

We now show that we can bound ~)(x) below. This is similar to the proof given for 
Chebyshev’s estimate. As in that proof, if p is a prime, let m, be the highest power 
of p such that p’’’| (") and let k, be the first exponent such that pke+! > 2n. Then 


psx 


as before 
2n m 
= I] Dp P 
n p<2n 
and 
In2n 
Mp <I 


Inp” 
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It follows that 


in") = YS minps Sie olin = w(2n). 


p<2n p<2n 


Further from Lemma 4.2.4(ii) 
2n 
> 2” => (2n) > nIn2. 
n 
Ifx > 2 letn =[5] = | and then 
1 
W(x) > Wn) > nin2 > a In 2. 
Therefore w(x) > Cx with C = m2 completing the proof. 


Considering again the result of Lemma 4.3.1 that 


W(x) = O(x) + O(x2 (Inx)’) 


coupled with the fact that x2 (In x)* = o(x) we obtain that 


x O(x 
VO) _ “ ) +o(1). 
x 
In particular this implies that 
lim uo = | if and only if im = =1. 


In the notation we introduced earlier this says that 


w(x) ~ x if and only if O(x) ~ x. 
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We show now that each of these statements is equivalent to the prime number theorem. 


Theorem 4.3.2 The following are all equivalent formulations of the prime number 


theorem 

(@) WA) ae 
(b) 0(x) ~ x, 
(ce) WR) x, 


Proof From the remarks immediately preceding the theorem we have ae A(x) ~ xif 


and only if W(x) ~ x. Therefore, it is sufficient to show that m(x) ~ 
to (x) ~ x. 


ie 


is equivalent 
x 
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We have that 0(x) < a(x) Inx and further Ax < 0(x) for some constant A. There- 


fore 
A(x) Ax 
T(x) = —— =. 
Inx Inx 


For any real « with 0 < € < | we have 
x)= >> np>U-)nx >) 1 
xl 2 psx x2 pax 
= (1— 6) Inx(m(x) — (x9) = (1 — 2) Inx(a(x) — x9) 


since x!~* > m(x!~*). 


It follows that 
A(x) 


< 1l—e ee a 
i a a 


Combining these inequalities gives 


Ax A(x) =e A(x) 
= S < a(x) <x 6 + ——_—_ 
Inx Inx (1 —e)Inx 
from which it follows that 
mx)Inx x!-*Inx 1 
< + 
~ Ox) ~ Ox) l-e« 


Now 6(x) => Ax so 
x'-*Inx = Inx 


———  < —. 
A(x) Ax® 


Since € is arbitrary in (0, 1) the value a can be made arbitrarily close to 1. Further 


for a fixed e€ the value me can be made arbitrarily small by choosing a large x. 
Therefore 


x "In x " 1 id 
O(x) l—-e a 


for x large enough and e, arbitrarily small. Hence we have 


w(x) Inx 


—— <1 
= 9G) <Il+e4 


and thus 
_ m(x)Inx 
lin. ——— = 
x00 A(x) 
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By definition then 


a(x) In x 7 O(x) 
i a ae 


xX 


m(x)Inx ~ O(x) => 


From this it is straightforward that as x — oo we have 


A(x) T(x) 
n 


— — | if and only if 
x x 


or 
6(x) ~ x if and only if r(x) ~ —. 
Inx 


In this proof we will present the prime number theorem, we will actually show 
that w(x) ~ x and then invoke the above result. 
As we remarked in the last section Chebyhsev also proved that if 


W(x) 


x>00 x/Inx 


existed, then the limit would have to be one. Thus, he seemed very close to the prime 
number theorem. However, he could not actually prove this limit existed. We close 
this section by giving a proof of the result of Chebyshev. We need first the following 
result due to Mertens. This is one of several results in the area due to Mertens and 
known collectively as Mertens’ Theorems (see [N]). 


Theorem 4.3.3 If A(n) is the van Mangoldt function then 


> Ae ine) Oi 
n 


Proof Consider the sum 
>in). 
n 


n<x 


Since In x is an increasing function we have for n > 2 
x " x 

In(—) </ In(—)dt. 
n n-1 ft 


From this it follows that 


[x] x x x hay © Inu 
Sine) < f in(dr=x | —du<x f du. 
n 1 t I u2 i u2 


n=2 
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However the infinite integral J, a ne du is convergent so it has finite value A. There- 
fore 


[x] [x] 
Slin@) < Ax => Sin) = OG). 
n n 
n=2 n=2 
Hence 
Jinn = [x]Inx + O(x) =xInx + O(2). 


n<x 


As in the proof of Chebyshev’s estimate let 


so that 


Then taking logarithms we get 


In(Lx]!) = In [ yp) = Sonn = de, Inp 


Pp n<x psx 


=D Valine = ae 


m>1 p"™<x n<x 


where A() is the van Mangoldt function. Further 


LOA@ < PEM + Yam 


n<x n<x n<x 


Xx Xx 
= DEA) + ¥@) = DIEIA@) + OG) 


n<x n<x 


since w(x) = O(x). Combining these inequalities give us 


x 
DOA) = DJ Inn + O(x) =xInx + O(x). 


n<x n<x 
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Removing the factor x yields finally 


> MY aig OU), 
n 


n<x 


As an immediate corollary we obtain. 


Corollary 4.3.1 >... “2 =Inx + O(1). 


psx Dp 


Proof By definition 


ep a 


n<x m>1 p"<x 


This implies that 


yy pe < Gt opts )iap 
P 


n<x psx P m>2 p"<x 


In p Inn 
rr) Da eT 


P(p-l) 7 “5 


This last infinite series converges to some value S. Hence 


yaa 


n<x psx 
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for some value A. Since from the previous theorem >* A) = Ing + O(1) it 


nsx on 


follows that 


In p 
> =Inx + O(1). 
D 


psx 


(x) T(x) _ 
eine exists then limy+00x7ing — 


Theorem 4.3.4 /f lim, 4 


Proof Recall that Y(x) = >),,<, A(”). Then 


A(n) | 1 1 ~@) 
2 D YO ie [x] 


n<x n<x—-1 


which follows easily since A(n) = w(n) — w(n — 1). Since w(x) = Wn) ifn < x < 


n-+ 1 we have 
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1 -[" vO 
n+1 a oro 


Summing then yields 


y voce -—) = Oat 


n<x—l 2 


since w(1) = 0. Hence 


At) _¥@) [VO 
2 


2 
n<x ae 2 F 


Since 


pee) = ine 4 00) and = Oc) 
n Xx 


n<x 


it follows that 


PO grains + O(1). 


2 


Now suppose that lim inf et) = 1+ e€withe > 0. Then 


1 
ge) = CE 
for x sufficiently large say x > xo. Then 


gee OO at PO a> (+ sony A 
2 


2 
2 t xo 


for some constant A. However, this contradicts that 


OU = Inx + O(1). 


On the other hand, if lim sup a) =1- 


tradiction. Therefore 


€ with e€ > 0 we obtain an analogous con- 


lim sup > 1 and lim inf <1 


¥@) 
x 


and therefore, if the limit ve) existed as x — oo the value would have to be one. 
Further since 
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SO) coigandontgir 


AOS | 
x/Inx x 


this shows that if ie has a limit its value must be one also. 


4.4 The Riemann Zeta Function and the Riemann 
Hypothesis 


From Chebyshev’s estimate and its consequences it seemed that a proof of the prime 
number theorem was close at hand. In 1860, B. Riemann attempted to prove this main 
result. Riemann eventually wrote only one paper in number theory, and although he 
failed in his primary goal of proving the prime number theorem, this paper had a 
profound effect on both number theory in particular and mathematics in general. 
Much as Gauss’s Disquisitiones Arithmeticae set the direction for elementary and 
algebraic number theory, Riemann’s work set the direction for analytic number the- 
ory. Riemann’s basic new (and brilliant) idea was to extend the zeta function of Euler 
¢(s) (see Section 3.1.2) to allow complex arguments that is to allow s to be a complex 
number. This idea of Riemann initiated the use of complex analysis, specifically, the 
theory of analytic functions and complex integration, into number theory and laid the 
ground work for modern analytic number theory. Recall that use of analysis begins 
with the Euler zeta function and continues through the work of Dirichlet. However, 
it is this paper of Riemann and the introduction of complex analytic methods that 
really is the beginning of analytic number theory. 

Euler had introduced ¢(s) for real s in giving a proof that the primes were infinite 
and that the series > 4 diverges. Dirichlet used a variation of this function, still for 
real s, in building the Dirichlet series used in the proof of his theorem on primes in 
arithmetic progressions (see Section 3.3). Riemann, in allowing complex s, showed 
that the resulting function ¢(s) is an analytic function for Re (s) > 1 and further can 
be continued analytically (see the next section) to a function, which we will also 
denote ¢(s), that is, analytic in all of C except s = 1. Further s = 1 is a simple pole 
with residue 1, that is, 


1 
¢(s) = —— + H(s) 
s—l 


where H(s) is an entire function. Riemann then showed that knowledge of the loca- 
tion of the complex zeroes of ¢(s) describes the density of primes. In particular, 
if there are no zeroes along the line Re (s) = 1, this would then imply the prime 
number theorem. This was precisely the main step in the proofs of Hadamard and 


de la Vallee Poussin (given independently) of the prime number theorem given 36 
years after Riemann’s paper. 
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4.4.1 The Real Zeta Function of Euler 


Recall that the Euler zeta function was defined for real s > 1 by 


| 
8) = Daas 


From the classical p-series test this series converges absolutely for s > 1 and hence 
defines a real C™ function in this range. Further, as s + 1, ¢(s) — oo which implies 
through the Euler product representation that there are infinitely many primes (see 
Section 3.1.3). 

As a direct consequence of the Fundamental Theorem of Arithmetic, Euler derived 
the following product decomposition (see Section 3.1.2) 


1 
cs)= [] =e 


Pp prime 


This product decomposition will remain valid for complex s with Re(s) > 1 and 
hence it is clear that there are no real zeros of ¢(s) ifs > 1. 

There are ties between the zeta function and several of the other arithmetical 
functions which we have worked with in this chapter. First, from the Euler product 
decomposition we obtain by logarithmic differentiation 


Recall again that the van Mangoldt function A(7) is defined for positive integers by 


Inp ifn=p*,c>1 
A(n) = 
) for all other n > 0. 


Therefore 
CO 


l A(n) 
yyy 


Pp m=1 n=1 


—C(s) So AM) 
C(s) ae 


Next, again from the Euler product decomposition, we have for s > 1 


cq =| [@—2"). 
P 
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Expanding the infinite product yields 


G(s) =1- > p+ > (pq) — Do (ary ++ 
Pp Pq 


Past 


with p,q,r,... primes. In this summation only squarefree integers appear. Further 
for a squarefree integer n, the coefficient of n~ in the above product is +1, depending 
on whether the number of prime factors of n is odd or even. This is precisely ju(1) 
where ,1(7) is the Mobius function (see Sections 3.3 and 3.6). Therefore 


yt = 
n=1 - 


Lemma 4.4.1 Fors > 1 we have the following relationships. 
lL ¢sy t= pet we where p(n) is the Mobius function. 


2 - o = AC where A(n) is the van Mangoldt function. 


Euler further determined the exact value of ¢(2) and showed that it was a Orig- 
inally this was done by a clever use of certain trigonometric identities (see [NZM]). 
Subsequently Euler developed a method to determine the values of ¢(s) at all even 
integers. We first give a proof of the basic result that ¢(2) = = using a different 
approach. Some basic ideas from the theory of Fourier series are needed. 

Recall that a real or complex function f(x) is periodic of period L if f(x + L) = 
f(x) for all x. In the early 1800s, Fourier attempted to prove that any periodic 
function can be expressed as a trigonometric series that is a sum of sine functions 


and cosine functions. If f(x) is periodic of period 2Z then its Fourier series is 


720 L5G cos(——~) + by sin(—~)) 
— 4 n L n L . 


n=1 


Using certain orthogonality relations between sines and cosines Fourier showed that 
if f(x) = f(x) then the coefficients ao, a,, b, must be given by 


1 L 
ayo = sz |, foes 


i f(x) cos(——~)d 1,2 
a,= > cos(——)dx,n = 1,2,... 
Lie L 


1 7 . NTX 
R=) J@ism— jayne, 2,0: 
LJet l 
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The a,, b, are called the Fourier coefficients. Fourier assumed that f(x) = f (x) 
but the situation was not definitively proved until the theory of Lebesgue integra- 
tion was developed. What was then obtained is called the Fourier Convergence 
Theorem. 


Theorem 4.4.1 (Fourier Convergence Theorem) (see [Gr]) Let f (x) be periodic of 
period 2L. Then: 


1. If both f(x) and f'(x) are piecewise continuous on (—L, L) then the Fourier 
series conver, intwi. E+ f@) 
ges pointwise to the mean value 5 ; 

2. If both f (x) and f'(x) are continuous on (—L, L) then the Fourier series con- 


verges uniformly to f (x). 


Therefore, a C! periodic function is everywhere represented by its Fourier series 
realizing Fourier’s original idea. We now give Euler’s result using Fourier series. In 
Section 4.7 we present a separate more general proof. 


Theorem 4.4.2. ¢(2) = ©. 


Proof Let f(x) = x*, —m < x < mand then continued periodically with period 27. 
This is continuous everywhere and differentiable everywhere except at integer multi- 
ples of 7. Therefore by the Fourier convergence theorem it is everywhere represented 
by its Fourier series. 

We apply the formulas. First, f(x) is an even function so there are only cosine 
terms and hence b, = 0 for all n. Then 


1 i ' 1 
a=— x°dx = — 
27 Jn 3 


1 id 2 n 5 
a= = x* cos(nx)dx = (—1)"—~ 
TJ n 


and 


using integration by parts and the fact that cos(n7) = (—1)”. Therefore the Fourier 
series for f(x) is given by 


2 


oo n 

> T (-1 

x =—+4)5 5 COSNX, —TM<X<T. 
3 n 


n=1 


Now let x = 7 and place this value into the Fourier expansion. Then 


2 oo n 
-1 
r= 7 + 2 ( ” cos(n7). 


But cos(n7) = (—1)" so 
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In Theorem 6.4.16, we will prove that 7 is a transcendental number. With this fact 
the above result leads directly to another proof that there are infinitely many primes. 

Euler’s method to find ¢(2) involved a detailed look at certain trigonometric 
identities (see [NZM] or [Na]). Subsequently, he developed a technique to determine 
the value of ¢(s) for s an even positive integer. In particular, he tied the values of 
¢(2n) to the Bernoulli numbers B,,. These numbers are defined in terms of the 
coefficients of the Taylor series expansion about x = 0 of the function f(x) = 4 
with f (0) = 1. Specifically 


x Bays 
ex — 1 nto 


Euler proved the following. For a proof of this result see Section 4.7. 


—1)"-!B, 
Theorem 4.4.3 ¢(2n) = oe (2)? : 

Substitution in this formula and using that By = i, Bsa = -z yields ¢(2) = = 
and ¢(4) = a Euler himself determined up to ¢ (26) for evenn. From Euler’s formula 
and the fact that 7 is transcendental it follows that ¢(2n) is transcendental for any 
even positive integer 2n. On the other hand very little is known about the arithmetic 
nature of ¢(s) for s = 2n + 1 an odd positive integer. It was shown by R. Apery (also 


by DeBranges) that ¢(3) is irrational and Apery also gave the following formula 


The number ¢(3) is called Apery’s constant and has an approximate value of 
1.202057. Euler’s result has also been recovered using Fourier series methods along 
the lines of the proof we gave for ¢(2) = - 

There are several equivalent analytic expressions for ¢(s) for real s > 1. We 
mention one such expression here because of the ties to the analytic continuation 
of the complex Riemann zeta function. This will be discussed shortly. In order to 
introduce this expression we must first describe the Gamma function. 
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Definition 4.4.1 [fs > 0 the Gamma function is given by 


[o.@) 
T'(s) -| me aa ee 
0 


By a straightforward integration by parts (see exercises) we obtain the following. 
Lemma 4.4.2 [(s + 1) = sI'(s). 
It is easy to determine that (1) = 1. Hence 
M2) = 1F dC) = 1,08) = 2PQ) = 2!, F(4) = 3F (3) = 3),... 
An easy induction then gives that: 
Corollary 4.4.1 [(n) = (n— 1)! foranyn > 1,n EN. 


The Gamma function is then the extended factorial function. 

The functional equation I'(s + 1) = sI(s) allows us to extend the definition of 
I'(s) to all nonpositive real numbers s except for 0 and the negative integers. Further 
lim,_,_, I (s) = co,n EN. 

Another important result whose proof we will outline in the exercises is the fol- 
lowing. 


Lemma 4.4.3 [(3) = /7. 
The relation we wish to show for ¢(s) is given in the next theorem. 


Theorem 4.4.4 For reals > 1 


gsy= [Fa 
d=a5 f fai" 


Proof Fors > | let 
1 oo pl 
G(s) = —~ 
(s) T(s) Jo ef -—1 


dt. 


We show that G(s) = ¢(s). Recall that the sum of a geometric series with ratio r is 
given by 


It follows then that 
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Now 

po! 1 oo P oo : 

_ -t,ys-1 = t,s—1 t _ 4s-l1 t 
sen ania t == ie t de =t de j 
k=0 k=1 
It follows that 
lo) po! oo oo ; ‘ 
dt = (f et dt). 
i: e—] a 0 


Now let y = kt so that dt = idy and substitute 


[oe] 


6)=—5 eHy-lgy — 1 sf one y- \dy) 
I'(s) 0 M(s) ¢ 


k=1 


1 «m9 1. {* 
- ( =| y* le dy. 
P(s) 2d ks" Jo 


However [>~ y°!e~dy = T'(s) and therefore 


(oe) 


1 
Gs) = DG = SO). 


k=1 


4.4.2 Analytic Functions and Analytic Continuation 


Riemann introduced complex analysis, specifically the theory of analytic functions 
and the theory of complex integration, into the study of number theory. In this section, 
we briefly go over the basic necessary ideas. 

If w = f(z) is a complex function then the complex derivative is defined in 
exactly the same formal manner as the real derivative. 


Definition 4.4.2 If f(z) is any complex function, then its derivative f' (zo) at zo € C 


° f Zo + Az) — f (Zo) 
Az 


j = 
f Go) = jim, 
whenever this limit exists. If f'(zo) exists, then f (z) is differentiable there. f(z) is 


differentiable on a whole region if it is differentiable at each point of the region. 


The complex function w = f(z) is analytic or holomorphic at zo if f(z) is 
differentiable in a circular neighborhood of zo. f(z) is analytic in a domain U if 
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it is analytic at each point of U. If f(z) is analytic throughout C, then it is called 
an entire function. Many of the standard functions from analysis: polynomials, e*, 
sin Z, COs Z, appropriately defined for complex arguments, are entire. 

If f(z) is a complex function defined on a region U containing the curve 


yYH=xOQt+iyO, ost<n 


then the complex contour integral de J (z)dz is defined by 


[fea | f(y@))7 dt. 
ay to 


Most of complex analysis deals with the properties and implications of complex 
integration of analytic functions. One of the cornerstones of this theory is Cauchy’s 
Theorem. 


Theorem 4.4.5 (Cauchy’s Theorem) Let f(z) be analytic throughout a simply con- 
nected domain U and suppose ¥ is a simple closed curve entirely contained in U. 


Then 
| fou =0. 


As a consequence of Cauchy’s Theorem one obtains (via the Cauchy integral 
formulae) that analytic functions have the property that they have derivatives of all 
possible orders. Thatis, if f(z) is analytic at zp then f’(zo), f" (Zo), --- f (Zo), «+ 
all exist. Further in a neighborhood of zo the function f(z) is then given by a con- 
vergent Taylor series centered on Zo: 


oO ¢(n) 
f@)= pa uf 50) = 2g)" for |g —z9| < R, 
n=0 : 


The derivatives f ") (zo) are given by the Cauchy integral formula as 


Fo) == ud 


; = +1 
2mi Jy (Z — Zo)" 


where + is any simple closed curve around zo within a simply connected domain 
U where f(z) is analytic. Recall that a simply connected domain in C is a region 
where every simple closed curve can be continuously shrunk to a point, that is, a 
region which has no holes in it (see [Ah]). Hence, the values of a complex analytic 
function and its derivatives within U are determined by its values on the boundary. 
Hence the interior values are a type of average of the boundary values. Although we 
will not pursue this further, the idea has been exploited extensively in number theory 
and analysis. The next theorem summarizes all these comments. 
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Theorem 4.4.6 Suppose f(z) is analytic in a simply connected domain U contain- 
ing Zo and y is a simple closed curve within U. Then: 


1. f(z) has derivatives of all possible orders at zo. 
2. There exists an R > 0 such that f(z) is given by a convergent Taylor series 
centered on Zo: 


(oe) 


(n) 
fO= > eT — 29)" for |z — Z| < R. 


n=0 


3. The derivatives are given by the Cauchy integral formulae as 


! 
f (zo) = = f() 


ti é (z _ Zo)rrl 


We note that Theorem 4.4.6 is in distinction to the situation for real differentiable 
functions. A function y = f(x) with x, y € R can have one derivative but not two, 
two derivatives but not three and so on. Further there are real function which are C™, 
that is, they have infinitely many derivatives, but which are not given by convergent 
Taylor series. A real function which has a convergent Taylor series centered on x9 is 
said to be real analytic at xo. 

An extremely important concept in studying the zeta function is that of analytic 
continuation. The basic idea is the following: suppose a complex analytic function 
J (Z) is given by an analytic expression which holds in a domain S in C. Suppose that 
this is equivalent within S$ or within a subset of S to another analytic expression which 
holds in a larger domain S;. Then, the second expression can be used to analytically 
extend or continue f(z) to the larger domain S,. We make this precise. 

Suppose that f(z) is analytic on a domain S; and f2(z) is analytic on a domain 
So. Suppose that $; 1S. 4 @ and fi(z) = fo(z) on S$, N Sy. Then (f2(z), $2) is said 
to be a direct analytic continuation of (f;(z), S|). The individual pairs (f;, S)) 
and (f2, S2) are called function elements. A function element (/, S$) is an analytic 
continuation of (f;, S,) if there is a chain (f;, S;) of function elements connecting 
(fi, S1) to (f, S) and with each neighboring pair a direct analytic continuation. 
A global analytic function is a nonempty collection of function elements F = 
{(fa, Sa)} such that any two in this collection are analytic continuations of each 
other. A global analytic function is complete if it contains all analytic continuations 
of any of its function elements. 

Finally, analytic continuation is essentially unique in the sense that two analytic 
functions which agree on a sufficiently large domain, for example, an open neigh- 
borhood of a curve, are identical. 

As an example of a type of analytic continuation, consider the Gamma function 


lo, @) 
T'(s) =f Peat, 
0 
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This has meaning only for real s > 0. However, Euler proved that for real s > 0 


ws 00 
e7 S s 
r@)= 14+—) len 4.4.2.1 
j)=— Ih: rye (4.4.2.1) 
where 7 is Euler’s constant and has an approximate value of .57722. The expression 
in (4.4.2.1) is valid now for complex s with Re(s) > 0 and can be used for the 
definition of the complex Gamma function I"(z). Using the relation 


M(z+ 1) = 2I(z) 


the complex function can be continued to a function which is analytic except at 
z=0,z=-l1,z=-2,.... 

If f(z) is not analytic at zo but is analytic in a neighborhood of zo then Zo is 
called an isolated singularity. Isolated singularities are classified as either remov- 
able in which case lim,-,,, f(z) exists and is not infinite; a pole, in which case 
lim... f(z) = oo; or an essential singularity in which case lim,_,., f(z) does not 


exist. For a pole zo there exists an integer m > 1 such that f(z) = 2 2 — with h(z) 


analytic at zo. The minimal integer m with that property is called the order of the 
pole. If m = 1 then Zo is a simple pole. The value 


Li HE = 20" FH 
(m — 1)! zz dz'-! 


is the residue of f(z) at the pole zo. The residue is equal to 


1 
rel / f (dz 


where ¥ is any simple closed curve around zp within a domain around zg where f (z) 
is analytic. 
If f(z) has a simple pole at zp with residue wo then the function h(z) given by 


Wo 


h(z) = fw) - 


x = ZO 


is analytic at Zo. 

A function f(z) is meromorphic in a domain S if it is analytic except for poles 
which by definition are isolated. We will see in the next section that via analytic 
continuation the zeta funciton ¢(s) can be considered as a meromorphic function in 
the whole complex plane with a simple pole at z = | and with residue 1. Hence 


1 
¢(z) - —— = A(z) 
z—-l1 


where H(z) is an entire function. 
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4.4.3 The Riemann Zeta Function 


The Riemann zeta function starts with the Euler zeta function ¢(s) and extends it 
by allowing complex arguments s. That is 


CO 
1 
C(s) = > — wheres =o +itando,teR. (4.4.3.1) 
n> 


n=l 
Recall that for real numbers x and t we have 


x! = ell — cos(x Int) + i sin(x Int). 


It follows that |x'"| = 1. Therefore for each natural number n and s = 0 +it we 
have 
1 1 i ree 1 1 
I[l=| [=ISilzl=| 


ns notit n? nit 


nf | — lrRewl 


Consequently by the p-series test the series in (4.4.3.1) converges absolutely for 
Re (s) > | and hence defines ¢(s) as an analytic function in this domain. 

Since the basic formulas concerning the Euler product decomposition and those 
tying C(s) to the Van Mangoldt function hold on a connected arc (the part of the 
real line s > 1), by analytic continuation they are still valid for complex arguments 
within the domain of analyticity Re s > 1. Thus, we have 


C(s) = I] Goat Res > 1; 


Pp prime 


C(s) a AM) 
C(s) = 


,s €C,Res > 1; 
n> 
n=l 


and 


Cs) t= >» Hes €C,Res>1. 


n=1 


From the Euler product decomposition it is clear that ¢(s) has no zeros for 
Res > 1. 

The initial step in studying the zeta function and applying it to the proof of the 
Prime Number Theorem is to show that it can be continued analytically to a function, 
also denoted ¢(s), which is meromorphic in all of C. This is accomplished in several 
steps but we next state the whole result. 
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Theorem 4.4.7 The Riemann zeta function ¢(s) can be analytically continued to 
a function, also denoted ¢(s), which is meromorphic in the whole plane. The only 
singularity of ¢(s) is a simple pole at s = | with residue 1, that is, 


1 
G(s) = rar H(s) 


where H(s) is an entire function. 


As remarked above, for Re s > 1, it follows from the basic definition that ¢(s) 
is analytic. The first step is to analytically continue to a function that is analytic for 
Re s > 0 except s = 1. To do this suppose first that Re s > 2. Then 


lo) n+ a 
= (< aap de |s x acesy fb x dx 


n=1 ie n=1 


=s | [Mie ae 
1 


This final integral defines an analytic function of s for Re s > 1 and therefore by 
the uniqueness of analytic continuation this integral formulation of ¢(s) holds for 
Res > 1. 

Now consider the integral 


see =— =1+—. 
“/ as leak rer 1 


Combining this with the integral representation of ¢(s) gives 
1 [oe 
¢(s) = St 1 +5f ([x] — x)x7* dx. (4.4.3.2) 
Ss 1 


The integral on the right-hand side converges for Re s > 0 and hence for Re s > 0 
the right-hand side provides a meromorphic function with a simple pole at s = 1 
with residue |. Therefore, this provides an analytic continuation of ¢(s) to such a 
meromorphic function in the whole half-plane Re s > 0. 
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To proceed further we need the following functional relation involving ¢(s) and 


¢(1 — s) and which ties the Riemann zeta function to the complex Gamma function 
(see Theorem 4.4.4). 


Theorem 4.4.8 The Riemann zeta function satisfies the functional relation 
= S _(-s l-s 
PTS) (s) = 8 PPT(—— Jed — 8) 


or equivalently 


1... TS 
C(s) = 22° sin( 5 Pd —s)Cd—s),5 40,1. 


Proof The proof uses certain facts about the complex Gamma function and another 
function known as the Jacobi theta function. This latter function is defined as 


O(u) = > ent a 


n=—oo 


Using the theory of Fourier transforms applied to the function f(x) = etext 
can be shown that the Jacobi theta function satisfies the functional relation 


1 
a = Ju(u). 


Now recall that 


[oe] 
roy= | Pam Wa 
0 


so that - 
re =7 x SI e-* dy, 
2 0 


Applying the change of variables y = =] this becomes 
s ee 2 
PT (-ja = | yS/—-le—my dy, 
2 0 


This will hold for each positive integer n > 1. Summing over all the positive integers 
we get 


Preys [° 00) -dy"?lay = [a ny9?tay 4.4.3.3) 
0 0 


where 0; (y) = $(0(y) — 1). 
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If we make the new change of variable z = 1 then we have from the functional 
relation on 6 that : ; 
1 a) 
O(-) = JyA(y) => I@) = 
y vz 


Splitting the integral at y = | and using this change of variable gives us 


1 fore) 
[ 1(y)yS I dy = +f 0, (z)z- St Yaz. 
0 1 


1 
s(s — 1) 


Substituting this back into (4.4.3.3) we have 


a PD(E)C(8) = —+/ A(x) STV? 4 x 6-ldy, (4.4.3.4) 
> 1 


S(s 


The integral on the right-hand side of (4.4.3.4) converges and hence defines an 
analytic function of s. Hence, the whole right-hand side defines a meromorphic 
function which is invariant under the transformation s — 1 — s. Therefore, the left- 
hand side must also be invariant under this transformation implying that 


mPP(E)C(s) - re-op Sec 5) Gs) 


which is the desired functional relation. 

To obtain the equivalent formulation given in the statement of the theorem we use 
two properties of the Gamma function. The first is called the formula of comple- 
ments and is given by 


rord—s)= 


sin(ms) 


The second is called the duplication formula and is given by 


T(s)P(s + 5 = «/n2' 1 (2s). 


The duplication formula was originally given by Legendre. Using these formulas in 
(4.4.3.5) the relation becomes 


fae AS 
C(s) = 2°a°~ sin( 5) Pd —s)Cd —s),5 £0, 1. 


We leave the details to the exercises. 


Note that the functional relation has the form 


G(s) = K(s)¢(1 — 8) 
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where 
s _s—-l.: ms 
K(s) =2°r sms tt) —s). 


The transformation s > 1 —s has s = 5 as its center of symmetry. Therefore, 


since ¢(s) is defined for Re s > 5 the functional equation can be used to continue 
¢(s) to a function defined for Re s < 5 and hence defined over the whole complex 
plane. 

From the analytic continuation of the Gamma function it follows that the func- 
tion K(s) has singularities, that is, becomes infinite at the positive odd integers 
2n + 1,n => 1. However, ¢(2n + 1) is finite for alln > 1. Hence from the functional 
relation this is possible only is ¢(1 — s) = Oifs = 2n 4+ 1. Therefore ¢(s) = 0 at all 
the negative even integers —2, —4,.... These are called the trivial zeros of ¢(s). 

The functional equation also establishes that s = 1 is the only singularity of ¢(s) 
in the whole complex plane. This follows from the fact that ¢(s) has only a simple 
pole ats = 1 forRe s > 5 and the only singularities of K (s) are at the positive odd 
integers. Hence by analytic continuation this is true over the whole plane. Further 
the fact that s = | is a simple pole and that the residue is 1 follows from the integral 
representation of ¢(s) (4.4.3.2). These last comments complete the proof of Theorem 
4.4.7. 

What becomes crucial in applying the zeta function to the proof of the prime 
number theorem is the location of its zeros. In particular, we will see in the next 
section that the fact that ¢(s) has no zeros on the line Re s = | is equivalent to the 
prime number theorem. We have already seen that ¢(s) has zeros ats = —2, —4,.... 
These are called the trivial zeros. Riemann in his original paper showed that any 
nontrivial zeros must fall in the critical strip 0 < Re s < 1. Further, he conjectured 
that all the nontrivial zeros lie along the line Re s = : which is called the critical 
line. This is called the Riemann hypothesis and is still an open question. It has 
resisted solution for almost a hundred and fifty years and has had tremendous impact 
on both Number Theory and other branches of mathematics. Now that Fermat’s last 
theorem has been settled, the Riemann hypothesis can be considered the outstanding 
open problem in mathematics. We will say more about the Riemann hypothesis after 
we show that there are no zeros on the line Re s = 1. This fact was the fundamental 
step in the proofs of both Hadamard and de la Valle Poussin of the prime number 
theorem. Their proofs were independent and appear different but are essentially the 
same (see [Na]). 


Theorem 4.4.9 The Riemann zeta function ¢(s) has no zeros on the line Re s = 1. 


Proof The proof we give is a simplification of the proofs of Hadamard and De La 
Valle Poussin and was given by Mertens in 1898 (see [Na]). The starting off point is 
the inequality 


3+ 4cos 4 + cos(20) = 2(1 + cos(0))” > 0 for all real 0. 
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Now suppose that ¢(1 + it) = 0 for ¢ real and t 4 0. Then let 
b(s) = C(s)C4(s + it)G(s + 2it). 


Since the pole at s = 1 of ¢ 3(s) cannot cancel the zero of é 4(s + it) it would follow 
that @(s) is analytic and that 


In|d(s)| > —ooass > 1. 


Now take s to be real and with s > 1. By the Euler product decomposition 


In|¢(s +i2)| = Re (IIn@(s + it))|) = —Re (Di Ind — p*")) = 


P 
R —s—it i 2\—s—it 1 3)\—s—it 
cD Ort soy +,e) "Fy 
P 


oe) 
= Re Oy, a,n*—'') with a, > 0. 
1 


Then 


In |¢(s)| = Re ‘oD ayn *(3 + 4n—# + n-2!")) 
1 


(oe) 


= >) a,n~“ (3 + 4cos(t Inn) + cos(2t Inn)). 
1 


However, this last sum is > 0 by the trigonometric inequality given at the beginning 
of the proof, contradicting the fact that the limit must go to —oo. This contradiction 
then implies that ¢(1 + it) 4 0. 


Theorem 4.4.9 will imply the prime number theorem in roughly the following 
manner. This will be made precise in the next section. Recall that the prime number 
theorem is equivalent to (x) ~ x where w(x) is the Chebyshev function. Therefore, 
we want to show that w(x) ~ x. Now 


w(x) = >A) and [x] = >. il, 


n<x n<x 


Therefore, we want to show that roughly as x — oo the van Mangoldt function A(n) 
looks like 1. We have further 
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(oe) 


C(s) aun 
C(s) 2a 


If Re s > 1 we can obtain an integral representation of this 


—s-l 
-% - ae: W(x)x dx. 


If there are no zeros of ¢(s) on the line Re s = | then by complex integration this 
integral can be handled and in turn used to show that w(x) ~ x. 

Before closing this section we make some further comments on the zeros and on 
the Riemann hypothesis. Hardy in 1914 proved that ¢(s) has infinitely many zeros 
along the line Re s = 5. As of 2002, it is known that at least the first billion and a 
half nontrivial zeros of ¢(s) lie along the critical line. 

Selberg in 1942 showed that a positive proportion of the nontrivial zeros lie along 
the critical line. Levinson in 1974 improved this to show that at least ; of the nontrivial 
zeros are on the critical line. This has subsequently been improved to at least 40 % 
of the nontrivial zeros are on the critical line. 

There are several quantitative statements that are equivalent to the Riemann 
hypothesis. Koch in 1901 showed that the Riemann hypothesis was equivalent to 


a(x) = Li(x) + O(./x Inx) 


where Li(x) is the logarithmic integral function of Gauss 


xy 
ee a a 
> Int 


In a similar manner, the Riemann hypothesis can be shown to be equivalent to 
m(x) = Li(x) + O(?*) Ve > 0. 


An entirely elementary formulation of the Riemann hypothesis is the following 
(see [P]). Define a positive squarefree integer n to be red if it is the product of an 
even number of distinct primes and blue if it is the product of an odd number of 
distinct primes. Let R() be the number of red integers not exceeding n and B(n) 
the number of blue integers not exceeding n. The Riemann hypothesis is equivalent 
to the statement that for any € > O there exists an N such that for alln > N 


|R(n) — B(n)| < n2**, 


We mention one major extension of the Riemann hypothesis. Recall that for an 
integer k a Dirichlet L-series is defined by 
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Loewe > = 
n=1 


n 


where x is acharacter mod k and s is acomplex variable (see Chapter 3). Recall further 

that Dirichlet L-series also have Euler product representations. The generalized 

Riemann hypothesis is that the nontrivial zeros of any Dirichlet L-series also lie 
1 


along the critical line Re s = 5. 


4.5 The Prime Number Theorem 


We are now ready to prove the prime number theorem. 
Theorem 4.5.1 7(x) ~ =>. 

As we have already mentioned the proof is dependent on the fact that ¢(s) has 
no zeros on the line Re s = 1. The original proofs were given by Hadamard and 
De La Valle Poussin and were quite complicated. An exposition and commentary 
on the original proofs can be found in the book of Narkuwieiz [Na]. The proof was 
somewhat simplified by Wiener and others but still remained quite complicated. In 
1980, D.J. Newman found a way to give a proof using only fairly straightforward 
facts about complex integration and which allowed a relatively short proof to be 
presented. The proof we give is based on Newman’s method. 

In another direction in 1949 Selberg and then Erdos came up with an “elementary 
proof” of the prime number theorem along the lines that Chebyshev had begun a 
century earlier. This proof is elementary only in the sense that it does not use complex 
analysis and is in fact more complex, meaning complicated that the complex analytic 
proofs. We will say more about the elementary proof in the next section. 

Newman’s method is based on the following theorem and the subsequent corollary. 
We will state them and then show how they imply the proof of the prime number 
theorem. After this we will go back and prove them. 


Theorem 4.5.2 Let F(t) be bounded on (0, co) and integrable over every finite 
subinterval and suppose that the Laplace transform 


[o.e) 
G(s) = [ F(t)e “dt 
0 
is well-defined and analytic throughout the open half-plane Re s > 0. Suppose fur- 


ther that G(s) can be continued analytically to a neighborhood of every point of the 


imaginary axis. Then 
oo 
/ F(t)dt 
0 


exists and equals G(0). 
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Corollary 4.5.1 Let f(x) be nonnegative, nondecreasing and O(x) on [1, 00) so 
that the function 


[o.e) 
gis) =s ffx las 
1 
is well-defined and analytic throughout the half-plane Re s > 1. (g(s) is called the 
Mellin transform of f (x)). Suppose further that for some constant c the function 


Cc 


G(s) = g(s) — cat 


can be continued analytically to a neighborhood of every point on the line Re s = 1. 


Then 
fx) 
x 


> CASX > &. 


The proof of the prime number theorem now follows easily from the corollary. 


Proof (Theorem 4.5.1). Recall that the prime number theorem is equivalent to 
w(x) ~ x, that is, 
VW) 
—-lasx- ww. 
x 


Take f(x) in the corollary to be w(x). w(x) is nonnegative, nondecreasing and O (x) 
on [1, c©) so we must show that the other conditions of the corollary apply. We have 
already seen (see Section 4.4) that 


_ Cs) 
G(s)” 


g) = 3 / pax dx = 
1 


Since ¢(s) has a simple pole with residue 1 at s = 1 the same is then true of g(s). 
The analyticity of ¢(s) at the points of Re s = 1 with s # 1 and its nonvanishing 
on this line then imply that g(s) can be continued analytically to a neighborhood of 
each point on this line. Hence 


1 
G(s) = g(s) — e=1 


has an analytic continuation to the closed half-plane Re s > 1. Therefore, the con- 
ditions of the corollary are met (with c = 1) and hence 


> lasx > o. 


v@) 
Xx 


We now give the proofs of Theorem 4.5.2 and Corollary 4.5.1. 
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Proof (Theorem 4.5.2) We suppose that F(t) is bounded on (0, oo) and that its 
Laplace transform 


Go) = | F(t)e “dt 
0 


is well-defined and analytic throughout Re s > 0. We suppose further that G(s) 
can be continued analytically to a neighborhood of every point of the imaginary 
axis. Therefore, we have an analytic function, which we will also call G(s) which 
is analytic on a neighborhood of Re s > 0. Hence there is a 6 > 0, chosen small 
enough, such that G(s) is analytic for Re s > —6. 

Since f(t) is bounded, without loss of generality, we may assume that | F'(t)| < 1 
fort > 0. For A > 0 let 


a 
Gyo) = | F(tje “dt. 
0 


Since this is a finite integral and F(t) is bounded, G(s) is analytic for all s and for 
all finite \. We must show that 


r 
G)(0) = | F(t)dt + G(0) as \—> oo. 
0 


For R > 0 choose a 6 = 6(R) so that G(s) is analytic on and within the closed 
curve W where W is given by the arc of the circle |z| = R for Re s > —6d together 
with the line segment Re s = —0d. We picture this in Figure 4.1. 

We orient W to go counterclockwise and let W, be the part of W for Res > 0 
and W_ the part of W for Res < 0. 

Now for each A the function G(s) — G(s) is analytic at s = 0. Therefore by the 
Cauchy integral formula (Theorem 4.4.6 part (3)) we have 


i. 7 Ge=e 
G(0) — G\(0) = =— fl GeO 7, (4.5.1) 
WwW 


z 


Fig. 4.1 Curve W 
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We have the following inequalities which will be needed to evaluate the final limit. 
First for x = Res > 0, 


oe) CO 1 
iceh=G.e)S if F()e~dt| <|/ die 
d d |x| 


Next forx = Res <0 


A r 

me emt 1 —Ax 

IG\@)|=|f F@e“dt| < dt < a 
0 


Next, if we let H(z) = e**G(z) and Hy(z) = eG, (z) then clearly H(0) = G(0) 
and H)(0) = G)(0) so 


(0) — H\(0) = G(O) — Gy (0). 


(G(s)—G)(s))e*s 
R2 


Further, within and on W, the function is analytic so that 


dz=0 


—- Gr@)e" ag 


by Cauchy’s Theorem. Therefore combining these observations with (4.5.1) we get 
1 \-/1 Zz 
GO) — G0) = HO) — HO) = 5 f (G@) — Grae" + za)az. 
Ti Jw z R 


On the circle |z| = R we have 


1 Zz 2x 
Zz) OR?” R 
and hence on W, 
G G Az i eo et —_ 2 
\(G(z) — Gy(z))e a ~+ als aa aa) = er 
It follows that 
gf Ge Ge + Sdel s yak = 5 
— = ete aes, — +—_ = _ 
Midge RE OR R 


Now we consider the integral over W_. Since G)(s) is analytic for all s we may 
replace, using Cauchy’s Theorem, the W_ path by the corresponding integral over the 
semicircle W* = |z| = R, Re z < 0. Then by Cauchy’s Theorem and our previous 
inequalities 
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| Gere Sas i | OO ea ee 
“(— — — oe Ge = SS) 
On tes Oe RO ae eR 


Now consider 


| Gw@e*(— + {az (4.5.2) 
Oni Sy. ze 7 Re Z|. 5. 


Since G(s) is analytic on W_ there exists a constant B depending on 6 and on R 
such that 


1 Ss 
Is) + Fa < BonwW.. 


It follows that 
AS 1 ) Ax 
IG(s)e"(—+ mal < Be“ on W_. 
Ss R 


Therefore on W_ where x < —d < 0 the integrand in (4.5.2) tends to zero uniformly 
as \ — oo. On the remaining small part of W_ (take 6, < 6 small) the integrand is 
bounded by B. Hence given a fixed W chosen as above the integral in (4.5.2) tends 
to zero as A > oo. 

Now we put all of this together. Given € > 0 choose R = = Choose 6 as above 
so that G(s) is analytic within and on W. Finally determine a value A, so that (4.5.1) 
is bounded by e« for all \ > A. Combining then all the inequalities we get 


|G(O) — G)(0)| < 3e for A > Aj. 


Therefore 
G)(0) > G(O) as A > oo. 


The corollary follows in a relatively straightforward manner from this theorem. 


Proof (Corollary 4.5.1) We suppose that f (x) and G(x) satisfy the conditions given 
in Corollary 4.5.1. That is f(x) is nonnegative, nondecreasing, and O(x) on [1, oo) 
and 


gisy=s f° faye dx 
1 


is well-defined and analytic throughout the half-plane Re s > 1. Further, there is 
constant c so that the function 
c 
G(s) = gs) - —__— 
s—1l 
can be continued analytically to a neighborhood of every point on the line Re s = 1. 
Now let x = e’ and define 
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F(t)=e'f(e')—c. 
From the conditions on f (x) it follows that F(t) is bounded on (0, oo). The Laplace 


transform of F(t) is given by 


G(s) = fer —cje"'dt 
0 


=f f0ox dx — Se = Pegi ay. 
1 RY s+] Ss 


From the conditions on g(s) it follows that G(s) can be continued analytically to a 
neighborhood of every point of the imaginary axis. 

Now let t = — Inx and apply Theorem 4.5.2 to G(s). From this it follows that 
the improper integrals 


[etrey-oa= [Mea (4.5.3) 
0 1 xX 


exist. Since f(x) is an increasing function this implies that £@) 1a 


fo 


> Ca x > w. 


To see this last assertion suppose that lim sup —— > c. Then there would exist a 


6 > 0 such that for certain arbitrarily large y 


f(y) > (+ 26)y. 
Since f(x) is increasing it would then follow that 
f(x) > (c+ 20)y > (c+ 6)x for y <x < ay 


where o = *°. Then 
ct+oé 


oy = oy 6 
ies >| —dx = dlno. 
y om 


¥. x 


But this is bounded away from zero for arbitrarily large y contradicting that the 
improper integral in (4.5.3) converges. Therefore lim sup i Gl Se. 

Next suppose that lim inf f Be < c. Then in a similar manner there exists an inter- 
val cy < x < y witho < 1 and f(x) < (c — 4)x on this interval. Applying this to 
the integral we obtain 


ne * dx < [Dax = sine. 
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This is negative and again bounded away from zero contradicting the convergence 
of the improper integrals. It follows that lim inf a) >c. 


Since lim inf Lo < lim sup £ wt it follows that 


f(x) 


lim inf —— = lim sup J) =¢ 
x x 


and therefore, the limit exists and equals c completing the proof of the corollary. 


We have seen that the absence of zeros of ¢(s) on the line Re s = | implied the 
prime number theorem. It was pointed out by Wiener that the converse is also true 
and hence the prime number theorem is equivalent to the fact that there are no zeros 
of ¢(s) on Res = 1. 


Theorem 4.5.3. The prime number theorem is equivalent to the fact that there are 
no zeros of ¢(s) on the line Re s = 1. 


Proof We have already seen that the absence of zeros implies the prime number 
theorem. Suppose now that w(x) ~ x and ¢(1 + it) = 0 witht real and t 4 0. Then, 
if the order of the zero is m we have the expansion 


C(s) =c(s —(1 + it)" +--+ 


which is valid on a neighborhood of | + it. Let 


(5) x. A(n) 


g(s) = Cs) ai 


n=1 


The expansion above would imply that 
lim (s — 1)g(s +it) = —m. 
Re s+ 1+ 


Further ‘ 


ystl 


gs) = — +s [wo —y) dy withRes > 1. 
Ss 1 


Then since w(y) ~ y 


1 oo 
Is Das) S16 — Dsl + | o(y7®*)dy) = o(1) 


as Re s > 1*. This would imply that m = 0 contradicting the existence of a zero on 
the line Re s = 1. 
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4.6 The Elementary Proof 


As we have noted Chebyshev’s theorem (Theorem 4.2.1) appeared to be quite close 
to the Prime Number Theorem. It provided the right bounds and further Chebyshev 
showed that if lim,_, 95 7) NX existed then the value of the limit must be one. Cheby- 
shev’s methods were elementary in the sense that they involved no analysis more com- 
plicated than simple real integration and the properties of the logarithmic function 
(although the proofs themselves were complicated). This would seem appropriate 
for a proof of a theorem about primes since primes are in the realm of arithmetic and 
should not require deep analytic notions. However Chebyshev could not establish 
that the limit exists and then Riemann, ten years or so later, tried a different approach 
using the theory of complex analytic functions. As discussed in the last section, the 
proof of the prime number theorem was reduced to knowing the location of the zeros 
of the complex analytic Riemann zeta function. Still, even with Riemann’s ideas, 
the proof resisted solution for another 36 years and during this time many mathe- 
maticians began to doubt that the limit lim,_, 7) INX exists. These doubts were put 
to rest with the proofs of Hadamard and de La Valle Poussin. As we have proved 
(Theorem 4.5.3) the prime number theorem, a result seemingly arising in arithmetic, 
is equivalent to the result that there are no zeros of the Riemann zeta function ¢(s) 
along the line Re(s) = 1, aresult really in complex analysis. This raised the question 
of the actual relationship between the distribution of primes and complex function 
theory. This led to the further question of whether there could exist an elementary 
proof of the prime number theorem along the lines of Chebyshev’s methods. 

The opinion that came to prevail was that it was doubtful that such a proof existed. 
The feeling was that complex analysis was somehow deeper than real analysis and 
in view of the equivalence mentioned above it would be unlikely to prove the prime 
number theorem using just the methods of real analysis. On the other hand it was 
felt that if such a proof existed it would open up all sorts of new avenues in Number 
Theory. 

The English mathematician G.H. Hardy, who made major contributions to the 
study of the relationship between the prime number function 7 (x) and Gauss’s loga- 
rithmic integral function Li(x), described the situation this way in a lecture in 1921 
(see [N]). 


G.H. Hardy: No elementary proof of the prime number theorem is known and 
one may ask whether it is reasonable to expect one. Now we know that the theorem 
is roughly equivalent to a theorem about an analytic function, the theorem that 
Riemann’s zeta function has no roots on a certain line. A proof of such a theorem, 
not fundamentally dependent upon the ideas of the theory of functions, seems to me 
to be extraordinarily unlikely. It is rash to assert that a mathematical theorem cannot 
be proved in a particular way; but one thing seems quite clear. We have certain views 
about the logic of the theory; we think that some theorems, as we say “lie deep” and 
others nearer to the surface. If anyone produces an elementary proof of the prime 
number theorem, he will show that these views are wrong that the subject does not 
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hang together in the way we have supposed, and that it is time for the books to be 
cast aside and for the theory to be rewritten. 


However what actually occurred was even more surprising. Selberg and then Erdos 
and then Erdos and Selberg together in 1948 developed elementary proofs of the 
prime number theorem along the lines of Chebyshev’s methods. All of these proofs 
depended on asymptotic estimates for an extension of the von Mangoldt function. 
These asymptotic estimates are now called Selberg formulae. The discovery of this 
elementary proof put to rest the discussion of the relative profoundness of complex 
analysis versus real analysis. However, despite the brilliance of the Selberg-Erdos 
approach, it did not produce the startling consequences in understanding both the 
distribution of primes and the zeros of the Riemann zeta function that were predicted. 
There are now many so-called elementary proofs and the techniques involved have 
become standard in analytic number theory. It may be that in time these methods 
will lead to a deeper understanding of the basic questions. 

In this section, we will state the Selberg formulae (without proof) and then outline 
(also without proof) how this formula leads to a proof of the prime number theorem. A 
complete exposition of Selberg’s original proof can be found in the book of Nathanson 
[N] while a self-contained exposition of another elementary proof is in the book 
of Tenenbaum and Mendes-France [TMF]. A slightly different approach based on 
Selberg’s methods can also be found in Hardy and Wright [HW]. 

The Selberg formula from which the elementary proof can be derived is the fol- 
lowing. 


Theorem 4.6.1 (Selberg Formula) For x > 1, 


Sidn p+ >) Inping = 2xInx + O(x) 
psx P.qsx 


where p,q run over all the primes < x. 


Several alternative formulations of this result are used in the elementary proof. 
First, the formula can be expressed in terms of the von Mangoldt function which we 
used in our other (nonelementary) proof. In particular: 


Theorem 4.6.2 (Selberg Formula) For x > 1, 


SAG Inn+ DS AMA(n) = 2x Inx + O(%) 


n<x nym<x 
where A(n) is the von Mangoldt function. 


To show that these are equivalent the two sums are considered separately. We give 
a partial demonstration. Consider, the first sum 5°, A(v) Inn. Since A(n) = 0 if 
n & p* fora prime p and A(p*) = In p we have 


n<x 
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SJA@ Inn=>7dnpy + >) kdnp)?. 


n<x pTx pk <x,k>2 
If p* < x with k > 2 then p < ./x. Hence 


id 


>) kdnpy? = >) dnp) >°k 


pESx,k>2 psJ/x k=2 
Inx 
< >) dnpyG—) s Vxdnx). 
psJx P 


However, clearly 


J/x(In x)? = O(x) 
and therefore, it follows that 


>) At) Inn = >in py? + O(x). 


n<x psx 


In a similar manner (see the outline in the exercises) 


>) A@A(m) = >) Inping + 0G). 


n,msx P.qsx 


Hence for x > 1, 


SJ At) Inn 4 Ey A(n)A(m) = 2x Inx + O(x) 


n<x n,m<x 


if and only if 
didn p+ So Inping = 2xInx + O(a). 


psx P.qsx 


Therefore, the two versions given of Selberg’s formula are equivalent. 

If we introduce a generalization of the von Mangoldt function, Selberg’s formula 
can be expressed in a very succinct manner. To do this, we must introduce some 
operations on the set of arithmetic functions. 

Recall that a number theoretic function is any complex-valued function whose 
domain is the set of natural numbers N (see Section 3.6). We have introduced numer- 
ous examples of such functions: the von Mangoldt function, the Mébius function and 
the Euler phi function to name just a few. On the set of number theoretic functions 
we define addition in the standard way pointwise. That is, if f(7), g(m) are number 
theoretic functions then 
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(f+ p(n) = fr) + gm). 


The function given by O(n) = 0 for all n € N is then an additive identity for this 
addition. 
We define a multiplication in the following manner. 


Definition 4.6.1 /f f(n), g(n) are number theoretic functions then their Dirichlet 
convolution is the number theoretic function given by 


fg) => f(g). 


d\n 


If we define 


5(n) = Ss 
Oifn>2 


then 6(7) is a multiplicative identity for Dirichlet convolution. With these operations 
the set of number theoretic functions becomes a ring. 


Theorem 4.6.3 The set of number theoretic functions with addition defined point- 
wise and multiplication given by Dirichlet convolution forms a commutative ring 
with an identity. 


The proof is a straightforward calculation (see the exercises). 
We need the idea of Mébius inversion (see Section 3.6). Recall that the M6bius 
function jv is defined for natural numbers n by 


1 ifn=1 
(nn) = 4 (-1)" ifn = pi p2--: p,; with p),..., p, distinct primes 
0) otherwise. 


For number theoretic functions we then have the following formula known as the 
Mo6bius inversion formula which was stated and proved in Section 3.6. 


Theorem 4.6.4 (Theorem 3.6.4) (Mébius Inversion Formula) Let f (n) be anumber 
theoretic function. Define 


gin) = >° fd). 


d\n 


Then ‘ 
f(r) = Dud) g(3). 


d\n 


Based on Dirichlet convolution and using MGébius inversion we define a general- 
ization of the von Mangoldt function. First, define 
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L(n) = Inn foralln EN. 


We then have: 
Lemma 4.6.1 A(n) = «x L(n) where js is the Mobius function. 


Proof Let 1(n) = n for all n € N. Then, ifn = pj'--- pi‘ we have 


Ix A(n) = Syd) = > di A(d) 


d\n d\d,=n 
=e Inppt+---t+eInp,y =Inn = Lin). 
Therefore 1 x A = L and so from the Mobius inversion formula 


pxeLa=A, 


Definition 4.6.2 For each r > 1 define the generalized von Mangoldt function 
A, = x L'. 


The tie to the Selberg formula is the following. 


Lemma 4.6.2. For each natural number n, 
Ao(n) = A(n) Inn + Ax A(n). 


Selberg’s formula can now be expressed concisely as 


Theorem 4.6.5 (Selberg formula) For all x > 1 


>) Ao(n) = 2x Inx + O(x). 


n<x 


The elementary proof requires two more equivalent formulations which tie the 
Selberg formula to the Chebyshev functions #(x) and w(x). 


Theorem 4.6.6 (Selberg Formula) For x > 1 


(1) A(x) Inx + $7 dn par) = 2xInx + O(x), 


psx 


(2) w(x) Inx + Amc) = 2xInx + O(x). 


n<x 
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In Theorem 4.3.2, we showed that the prime number theorem is equivalent to 
O(x) ~ x and to w(x) ~ x. In our earlier (nonelementary) proof we actually showed 
that ~(x) ~ x to establish the prime number theorem. In Selberg’s elementary proof 
he showed that 6(x) ~ x. In particular, if we let R(x) = 0(x) — x then the Selberg 
proof shows that R(x) = o(x) which clearly implies that 6(x) ~ x. More precisely, 
in the proof it is shown that there exists sequences (a,,), (b,) of positive real numbers 
such that 

|R(x)| < a,x for all x > b, 


and limy_so9 d, = 0. 

This is proved via a series of estimates whose proofs all work with, or start with, 
the Selberg formula (in one of its formulations), and then use tricky and difficult 
manipulation of series. The lengthy details of a completely elementary (again not 
simple but no complex analysis) proof due to Selberg can be found in the book of 
Nathanson [N]. A separate proof along the same lines but using some analysis is in 
the book of Hardy and Wright [HW]. Finally, a separate elementary proof (again 
using some analysis) is in the notes of Tenenbaum and Mendes-France [TMF]. 

It is an easy consequence of the prime number theorem that if p, is the nth prime 
then 


lim 2*t! — 1, (4.6.1) 


This fact however plays a role in the history of the elementary proof. When Selberg 
first gave his formula Erdos used it to give an elementary proof of (4.6.1). Selberg 
then used his formula along with the methods of Erdos’ proof to develop the first 
elementary proof of the prime number theorem. Erdos then gave a second elementary 
proof. There now exist several elementary proofs of the prime number theorem that 
do not depend on Selberg’s formula. A nice survey on the use of elementary methods 
in the study of primes was written by Diamond [Di]. 


4.7 Multiple Zeta Values 


Throughout this chapter we have seen the importance of the zeta function 


“1 
SOn2a,5 


For real values of s this is the Euler zeta function and we presented Euler’s proof of 
the infinitude of primes based on this. For complex values of s this is the Riemann 
zeta funciton and was essential in the proof of the prime number theorem. Recall 
that Riemann reduced the proof of the prime number theorem to showing that the 
zeros of the Riemann zeta function are off the line Re s = 0. The Riemann hypothesis 
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developed from Riemann’s work and states that all the nontrivial zeros of the Riemann 
zeta function are on the line Re s = 5 (see Section 4.4). The Riemann hypothesis is 
among the most important open problems in mathematics. However, there are many 
other important problems concerning ¢(s) and there is an entire line of research 
devoted to these as well as to generalizations of ¢(s). In this section, we introduce 
and briefly discuss a generalization of ¢(s) called multiple zeta values or MZV. 
This generalization, besides being of independent interest, also sheds light on the 
zeta function itself. For further information, as well as for the proofs of the results 
in this section, we refer to the paper by Burgos Gil, Fresan and Kiihn [BFK], or 
the survey articles [Wa] and [Zud]. Our short discussion follows the description in 
[BFK]. 

Before introducing the MZV we look back at certain results on the zeta function 
and repeat some of the material that we looked at in Section 4.4. Euler considered 
the problem of determining the value of ¢(m) for an integer m. The Basel problem, 
solved by Euler in 1735, asked for the value of ¢(2). In Section 4.4, we showed that 
¢2)= ss by using Fourier series. 

However, as mentioned, Euler proved a great deal more. In particular, he deter- 
mined the values of ¢(m) for all even integers m = 2k. The result depends on the 
Bernoulli numbers which are rational numbers defined by the power series identity 


t t 
sal 2 Bee (4.7.1) 
k=1 
Note that the function 
t 1 t(1 +e’) 
t)= t= 
fO PM 3 2(e! — 1) 


is even, that is, f(t) = f(—12). It follows that B, = -3 and B, = 0 for all odd 
integers k > 3. The first Bernoulli numbers are easily computed: 


k\j2 4 6 8 10 12 
Bulg 


30 42 30 66 2730 


We now give Euler’s theorem and a general proof which, given a certain trigono- 
metric identity, is straightforward. 


Theorem 4.7.1 (Euler, 1735) The values of the zeta function at even positive integers 
are given by 
£2) (21)** 


(20 = (Sa 


2k» 


where By, are the Bernoulli numbers. 
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Proof The key ingredient is the following identity for the cotangent function, also 
due to Euler (see exercises): for x € C \ Z, 


= 


me 252 (4.7.2) 
T COUTX) = — =» ./. 
x = x2 2 


Expanding the quotient inside the sum sign as a geometric series and interchanging 
the order of summation, we obtain 


mcot(7x) = 1_, pe) 4. mee (4.7.3) 
Xx 


k=1 


On the other hand, we have 


I e73 1 e: 
== - and =— -, 
e—] e2 —e 2 et—1 e2—e 2 
from which the identity 
e2 + en? Boy tk ! 
= == “+ 2 
e2 z Sr (2k)! 


follows, using this and the vanishing of B, for odd k. Hence: 


Ley (27i)* K Box 2k 1 (4.7.4) 
<= (2k)! _ 


mT cot(mx) = a 


Qnix 2aix 


ez —e 


and we conclude by identifying the coefficients in (4.7.3) and (4.7.4). 


Euler’s formula implies the following equality of subrings of R: 


QI¢(2), ¢(4), ...] = Qlr’]. 


Thanks to the functional equation (see Section 4.4) 
_s _iss 1- 
7 r(G ) So) =9-9 ‘Lp c(l—s), 


it follows that ¢(—k) = —f for allk > 1. 

By contrast, no one has been able to determine closed analytic formulas for the 
values of the zeta function at s = 3,5,7,..., in terms of previously known real 
numbers like 7. This leads to the following conjecture: 
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Conjecture 4.7.1 The numbers 1, 7, ¢(3), ¢(5),... are algebraically indepen- 
dent. 


Real numbers ky, ko, ..., Kk, are algebraically independent if 
P(ky, ...,kn) #0 


for each n > | and each nonzero polynomial P € Z[x,,..., Xn]. 


This conjecture seems to be completely out of reach. The transcendence of 7 was 
proved by Lindemann in 1882. By Euler’s result it follows that the numbers ¢ (2k) 
are transcendental. However, we do not know whether ¢(3) is transcendental, not to 
speak of the algebraic independence with 7. The are a few known results about the 
nature of the numbers ¢(2k + 1). We summarize them (see [BFK]): 


e Apéry proved that ¢(3) is irrational. Different proofs are known, but no one has 
been able to generalize them to show that for example ¢ (5) is irrational. 


e Rivoal and Ball and Rivoal proved that, if n is an odd integer > 3, then 


dimg(1, ¢(3), C(5),--., C(m)) = = log(n). 


1 
3 
In particular, infinitely many ¢(2k + 1) are irrational. 


e Zudilin proved that at least one out of the four numbers ¢(5), ¢(7), ¢(9) and ¢(11) 
is irrational. 


Besides the algebraic independence conjecture, the values of the Riemann zeta 
function at the integers are linked to many other interesting problems in mathematics 
(see [BFK]). 

In order to investigate possible relations among zeta values, Euler examined the 
algebraic structure of these numbers. If we multiply two Riemann zeta values we 
obtain a new kind of interesting sum; 


1 1 
C(s1) + (92) = (= =) (= 9) 
n n 
n>1} n>1 2 
1 
= 582 
ny,n2>1 12 
1 1 1 
= 3 Ta > ae > ae (4.7.5) 
ny>nz>=1 12 no>n>1 2°"1 n=n=no>1 


The first two terms in the last line are called double zeta values and have the 
various representations 
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C(S1, 52) = >» = 


ny>nz=1 Thy 1) 
3 of (a ee ee 
~ ent 2s2 (n — 1)% 


= ae: 
i (n + m)in?” 


mn>1 


With this notation, equation (4.7.5) can be rewritten as 


C(s1) * C(s2) = C1, $2) + C2, 91) + C(s1 + 52). (4.7.6) 
ele SS ee 
product of zeta values linear combination of zeta 


and double zeta values 


As we have seen, products of two zeta values are linear combinations of zeta and 
double zeta values. To handle products of more factors we shall need to generalize 
double zeta values to multiple zeta values (MZV). These new numbers satisfy many 
linear relations with rational coefficients and the main goal of the theory of MZV is 
to fully understand them. 

We now define multiple zeta values and begin to study its basic properties. Of 
great importance is that multiple zeta values can be written both as series and as 
integrals. There are two important operations uses in the study of MZV, the stuffle 
product and the shuffle product. From the series representation of MZV one obtains 
the stuffle product, whereas the integral representation gives the shuffle product. By 
comparing both products one obtains many relations among multiple zeta values. 
We will not delve into these products in this short introduction. The definitions of 
the stuffle and shuffle product can be found in [BFK]. 

In order to define MZV we introduce the following terminology: a tuple 


s=(s1,...,5) €Z! 


is said to be positive if 5; > 1 for alli = 1,...,/ and admissible if, in addition, 
5; > 2. By convention, the empty tuple will also be considered to be admissible. 


Definition 4.7.1 Lets = (s},.50,...,5)) € Z! be an admissible tuple. The multiple 
zeta value associated to s is the real number 


1 
(M=CssyN= DP aaa 


82 
n . n oe 
ny>nz>-->nj>=l 1 2 ! 


with the convention ¢(@) = 1. 


Note that if s is an admissible tuple, then the series ¢(s) converges absolutely. 
We define the weight of ¢(s) as 5s; + 52 -+----+ 5; and the length of ¢(s) (also 
called the depth in the literature) as /, and we write 
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wt(C(s)) = wt(s) = 5, +---+ 5; 
1(¢(s)) = L(s) =1. 


In particular, wt(1) = /(1) = 0. 


Definition 4.7.2 We will denote by Z the Q-vector space generated by all multiple 
zeta values 


Z = Q(IMZY] = (1, ¢(2), €(3), (2, 1), (4), --- a. 
We also define the following subvector spaces of Z: 
Zx = (C(S) | wt(s) = k)g, 
FiZ = (C(s) | U(s) < Ja, 
FiZx = (C(S) | wt(s) =k, U(s) < lg. 


Observe that there is an obvious inclusion F;Z, C F,;Z 1M Z,. This is actually 
expected to be an equality, but not known so far. 
This is the first indication that the Q-vector space Z has the structure of an algebra. 


Theorem 4.7.2 Multiplication of real numbers induces an algebra structure on Z 
which is compatible with the weight and the length, that is: 


Fi, Zk, + Fin Zky C Fit Zk 45 


for any integers l,, lz, ky and ko. 


The theorem shows in particular, that every product of zeta or multiple zeta values 
can be written as a linear combination of MZV. 


Corollary 4.7.1 Every polynomial relation between Riemann zeta values ¢(k) gives 
rise to a linear relation between multiple zeta values. 


Thus, the problem of finding algebraic relations among zeta values is reduced to 
the problem of finding linear relations among MZV: we have linearized the algebraic 
independence conjecture. 

The task of finding linear relations among multiple zeta values by elementary 
methods was first done by reordering multiple sums by means of a partial fraction 
decomposition. Nielsen proved the following reduction formula in 1906. 


Theorem 4.7.3 (Reduction formula, Nielsen 1906). Let p > 2 and q > | be inte- 
gers. Then 
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(p,q) = Se IEP C@ — 6p +k) 


p-2 


+EDI DV (GEN -— kg +h 


k=0 
+I +9) + Cp +a —-1, DI 
Making g = | we immediately get: 


Corollary 4.7.2 (Euler’s sum formula). [fs > 3, then 
C(s) = Eco -j,f). (4.7.1) 


In particular ¢(3) = ¢(2, 1). 


This ties the individual values of the zeta function to the values of the MZV. 
Nielsen extended this. 


Corollary 4.7.3 (see [BF K]) Ifn > 2, the following equalities hold: 


n—1 


>) 6(2r, 2n — 2r) = =6Qn), 
r=1 


n—-1 


>) 6Qr +1, 2n - 2r -1) = 160n). 


r=1 


Corollary 4.7.4 In Z we have the following linear relations: 


1. in weight 3: 
63) = ¢@, 1). 


2. in weight 4: 
C4) = 43,1), ¢(2, 2) = 3¢(3, 1). 


3. in weight 5: 
CS) = —4C4, 1) + 262, 3), €3,2) = ¢@, 3) — SC, D. 


4. in weight 6: 


13 7 
66) = 406, 1) +46, 3), C24 = 3 SO: 1) 36G; a 
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4 2 
6(4, 2) = — 366; 1)+ 360; 3). 


Numerical experiments show the following further relations: 


© ¢(2,2) = 7¢(4): 

e €(2,1,1)=¢@); 

© €(5) = 4¢(3, 2) + 6¢ (2, 3); 
e (5) = ¢(2, 1, 1, 1); 

e 64,1) =¢G, 1, 0); 

e ¢(2, 1,2) = ¢@, 3); 

e ¢(2, 1, 1) = ¢@, 2). 


From these identities we can obtain upper bounds for the dimension of the 
Q-vector space FZ, generated by zeta and double zeta values of weight k: 


Theorem 4.7.4 Let k > 3, then FoZ, is spanned by ¢(k) and C(r,k —r) forr = 
(k — 1)/2. In particular, we have 


. k—2 
dim Fn Z, < Far 


the smallest integer greater than or equal to — 

As we have seen there are many linear relations between MZV. A major line of 
research is to determine the structure of the algebra of MZV and find all possible linear 
relations among them. After extensive experimentation by many mathematicians, no 
nontrivial linear relation between MZV of different weight has been found. That 
is, all known relations among multiple zeta values are generated by homogeneous 
relations (see [BFK]). 

An important conjecture concerning the structure of the algebra of MZV was 
given by Zagier. The conjecture concerns the dimension of the space of multiple 
zeta values and there is large numerical evidence for it. In order to state the Zagier 
conjecture we introduce some Fibonacci type numbers. Set dy = 1, dj = 0, d) = 1 
and, for k > 3, 

dy = dk-2 + dk-3 


These numbers are determined by the power series identification 


[oe 

ear 
—72— 73° 
= tr 
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Conjecture 4.7.2 (Zagier) (see [BFK]) 


1. The weight defines a graduation on Z. That is, 


Z=Q—%, 


k>0 


and, in particular, 2,0 Ze =Oifk 4k’. 


There have been several extensions and refinements of Zagier’s conjecture (see 
[BFK]). The following two results, the first by Terasoma and by Delgne and 
Goncharov and the second by Brown show what is known about the Zagier con- 
jecture and its extensions (see [BFK] for references). 


Theorem 4.7.5 We have dim Z;, < dk. 


Theorem 4.7.6 The MZV with only 2 and 3 in their index do generate Z, and in 
particular we obtain dim Z, < dk. 


4.8 Some Extensions and Comments 


In Chapter 3, we looked at a large number of ways to prove that there are infinitely 
many primes and our look led us to a large array of number theoretical ideas. Basic 
congruences and the fundamental theorem of arithmetic handled many of the proofs 
but we used some elementary analysis to show that >° 4 diverges. We then used 
some more difficult analysis to prove that there are infinitely many primes in any 
arithmetic progression {an + b;n € N} with (a, b) = 1. However, despite the fact 
the set of primes is infinite it is clear that the density of primes among the natural 
numbers thins out as the natural numbers get larger. In fact we showed (Theorem 
2.3.2) that there are arbitrarily large gaps in the sequence of primes. Hence in this 
chapter we looked at the density of the sequence of primes. The major result was 
the prime number theorem which said that 7(x) ~ _ as xX —> oo where 7(x) is the 
number of primes less than or equal to x. However, we have just touched the tip 
of the iceberg relative to the study of the distribution of primes. In this section, we 
mention some further results and conjectures on primes and their distribution which 
are in the same spirit as the results and proofs of the last two chapters. 

By far the most important open problem surrounding the distribution of primes 
and the Prime Number Theorem is the Riemann hypothesis. We introduced this at the 
end of Section 4.4 but here we repeat what we said at that point and extend somewhat 
our comments and observations. Recall that the Riemann zeta function was defined 
for all s > 1 by 


[oe] 


1 
¢(s) = 3° 
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This could be continued analytically to a meromorphic function also denoted ¢(s) 
which is analytic for all complex s 4 | and which has a simple pole at s = 1. This 
fact follows from the fact that ¢(s) satisfies a functional relation 


C(s) = K(s)C(1 — 5) 


where 
ss—l ne WS 
K(s) =2°r aa —s). 

This functional relation also establishes that ¢(s) = 0 at all the negative even 
integers —2, —4,.... These are called the trivial zeros of ¢(s). Riemann in his 
original paper showed that any nontrivial zeros must fall in the critical strip 0 < 
Re s < 1. He further showed that if ¢(s) has no zeros on the line Re s = | this was 
sufficient to prove the prime number theorem. This final fact was proven by Hadamard 
and de la Valle Poussin. In the course of this investigation Riemann conjectured that 
all the nontrivial zeros lie along the line Re s = 5 which is called the critical line. 
This is the common form of the Riemann hypothesis. 


Riemann Hypothesis: All the nontrivial zeros of the Riemann zeta function lie 
along the line Re (s) = 5. 


The Riemann hypothesis has resisted solution for almost a hundred and fifty years 
and has had tremendous impact on both Number Theory and other branches of math- 
ematics. Now that Fermat’s last theorem has been settled the Riemann hypothesis 
can be considered the outstanding open problem in mathematics. There are various 
further results concerning the Riemann hypothesis and the zeros of the zeta func- 
tion. Hardy in 1914 proved that ¢(s) has infinitely many zeros along the critical line 
Res = 5. As of 2002 it is known that at least the first billion and a half nontrivial 
zeros of ¢(s) lie along the critical line. 

Selberg in 1942 showed that a positive proportion of the nontrivial zeros lie along 
the critical line. Levinson in 1974 improved this to show that at least 7 of the nontrivial 
zeros are on the critical line. This has subsequently been improved to at least 40% 
of the nontrivial zeros are on the critical line. 

There are several quantitative statements that are equivalent to the Riemann 
hypothesis. Koch in 1901 showed that the Riemann hypothesis was equivalent to 


a(x) = Li(x) + O(Vx In x) (4.8.1) 


where Li(x) is the logarithmic integral function of Gauss 


x 1 
Li — —dt. 
He) | Int 
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In a similar manner the Riemann hypothesis can be shown to be equivalent to 
1 
n(x) = Li(x) + O(x2*)) Ve > 0. 


The equality (4.8.1) was also conjectured by Riemann in his original paper and 
is often called the prime number theorem form of the Riemann hypothesis. 

There are many other computational variations of both the prime number theorem 
and the Riemann hypothesis. Many of these are discussed in the excellent book by 
Crandall and Pomerance [CP]. Several of these involve the Mobius function ju(n) 
and Merten’s function defined by 


M(x) = >> u(x). 


n<x 


Merten’s function is related to the Riemann zeta function by (see Section 4.4.3) 


Van Mangoldt proved the following. 


Theorem 4.8.1 The prime number theorem is equivalent to the statement 


p(n) _ 
d = 0. 


Further the following is also known. 


Theorem 4.8.2 If M(x) is Merten’s function then: 
(1) the prime number theorem is equivalent to 


M(x) = o(x). 
(2) the Riemann hypothesis is equivalent to 
M(x) = O(x?**) for any fixed « > 0. 


One of the questions that arises from the prime number theorem is which function 
exactly is the “best approximation” to 7(x). Note that for any positive real numbers 
A, B we have that Pureea is asymptotically equal to Li(x). Hence 

() r(x) ~ A, 

@) ra) —, = fore > 0, 

(3) 7) ~ gaziossee (Legendre’s estimate), 

(4) r(x) ~ Li(x) (Gauss), 
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are all equivalent to the prime number theorem. The question arises as to whether 
there is an optimal value for a in (2) above. Empirical evidence is that a = | is an 
optimal choice and generally better for large x than Legendre’s 1.08366 and better 
than Gauss’ Li(x). The table below compares the estimates. 
x mm) gy Li) coe maT 
10° 168 145 178 172 169 
10* 1229 1086 1246 1231 1218 
10° 9592 8686 9630 9588 9512 
10° 78498 72382 78628 78534 78030 
10’ 664579 620420 664918 665138 661459 
10° 5761455 5428681 5762209 5769341 5740304 


Observing the table above it is noticed that Li(x) > a(x). Riemann proposed 
that this is true for all sufficiently large x. This turned out to be incorrect. In 1914 
Littlewood [Li] proved the following. 


Theorem 4.8.3 The difference m(x) — Li(x) assumes both positive and negative 
values infinitely often. 


Littelwood’s proof was interesting in that it used the following technique which 
has become extremely useful in analytic number theory. First, he assumed that the 
Riemann hypothesis is true and proved that 7(x) — Li(x) changes sign infinitely 
often. He then showed that the same is true if the Riemann hypothesis is assumed 
to be false. A complete but somewhat simplified proof of Littelwood’s result can be 
found in [P]. More recently, Te Riele in 1986 [Re] showed that there are greater than 
10!8° consecutive integers for which r(x) > Li(x) in the range 6.62 x 10°” < x < 
6.69 x 10°79. 

In light of trying to improve the approximation to 7(x) afforded by Li(x) 
Riemann’s work suggested (see Zagier [Za]) that a) would be closer to — that is, 
the probability of choosing a prime randomly less than x would be closer to = if 
one counted not only the primes but also the “weighted powers” of the primes. That 
is counting a p? as half a prime, p° as a third of a prime and so on. This would lead 
to an approximation for Li(x) given by 


1 1 
Li(x) © w(x) + 50x?) + gm) Hees 
Upon inverting this 


T(x) & Li(x) -— shite!) = slice!) ae 
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Based on these ideas Riemann proposed the following explicit formula for 7(x), 
— p(n) 
1 
= —Li(x"). 4.8.2 
m(x) d 7 Litet) (4.8.2) 


The series on the right side of (4.8.2) can be shown to converge for x > 2 and is 
called the Riemann function R(x), that is, 


R@) =>. MO Tice, x >2, 


n=1 


Riemann’s conjecture was then that 7(x) = R(x). The equality given in (4.8.2) is 
not true, however it is asymptotically correct. That is 


Theorem 4.8.4 We have 1(x) ~ R(x) where R(x) is the Riemann function. 


In fact this approximation is remarkably close for large x. For x = 400, 000, 000 
we have 


7 (400, 000, 000) = 21, 336, 326 and R(400, 000, 000) = 21, 355, 517 
while for x = 1, 000, 000, 000 we have 
(1, 000, 000, 000) = 50, 847, 534 and R(1, 000, 000, 000) = 50, 847, 455. 


Related to Riemann’s explicit formula it can be shown that the distribution of the 
number of zeros of the Riemann zeta function along the critical line can be given 
asymptotically by 

t t t 
MO = 20 te 27 
where N(t) is the number of zeros z with z = 5 + is along the critical line with 
O<s<t. 

There are also some surprising relationships between some physical phenomena 
and the location of the zeros of the Riemann zeta function. The article [BK] discusses 
some of these which are far afield from our present presentation. 

An entirely elementary formulation of the Riemann hypothesis is the following 
(see [P]). Define a positive squarefree integer n to be red if it is the product of an 
even number of distinct primes and blue if it is the product of an odd number of 
distinct primes. Let R(n) be the number of red integers not exceeding n and B(n) 
the number of blue integers not exceeding n. The Riemann hypothesis is equivalent 
to the statement that for any € > O there exists an N such that for alln > N 


|R(n) — B(n)| < n2**, 
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AS we mentioned in Section 4.1 if p, denotes the nth prime then it is a straight- 
forward consequence of the prime number theorem that 


Pr ~ninn 


and hence 
‘ Pn+1 
lim —— = 1 


Pn 


even though there are arbitrarily large gaps in the primes. It was noted in the last 
section that when Selberg first gave his formula Erdos then used it to give an ele- 
mentary proof of the second fact above. Subsequently Selberg then used his formula 
along with the methods of Erdos’ proof to develop the first elementary proof of the 
prime number theorem. 

There are two well-known conjectures concerning the difference py+; — Pn. The 
first is called Cramer’s conjecture. 


Cramer’s Conjecture: p+) — Pn < (1 + 0(1))(Inn)?. 


It follows from Koch’s equivalence to the Riemann hypothesis that if the Riemann 
hypothesis is true then 


the 
Pati — Pra = O(p2" ) for any fixed € > 0. 


The second conjecture is called Lindelof’s hypothesis. 

Lindelof’s Hypothesis: 5°, <,(Pn+1 — pyar, 

It can be shown that the Riemann hypothesis implies the Lindelof hypothesis. 

Dirichlet’s theorem, giving that there are infinitely many primes in any arithmetic 
progression an + b with (a, b) = 1, extended the result that there are infinitely many 
primes. Dirchlet’s proof (see Chapter 3) used L-series and then an Euler product 
formula. Recall that for an integer k, a Dirichlet L-series is defined by 


L(s,x) = > uA 
n=1 


n 


where yx is acharacter mod k, and s is acomplex variable. Hence Dirichlet’s proof was 
an extension of the Euler proof of the infinitude of primes using the real zeta series. 
Along the same lines both the prime number theorem and the Riemann hypothesis 
can be extended to primes in arithmetic progressions. 

For (a, b) = 1 let 


a(x; a,b) = numbers of primes congruent to b moda and < x. 


The prime number theorem for arithmetic progressions can then be expressed as: 
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Theorem 4.8.5 (The Prime Number Theorem for Arithmetic Progressions) For fixed 
a,b > Owith (a, b) = 1 then 


Ri Che ne a 
-_ o(a) g(a) Inx = g(a) 


The result can be expressed in probabilistic terms by saying that the primes are 
uniformly distributed in the ¢(a) residue classes relatively prime to a. In fact much 
of the material on the prime number theorem can be rephrased in terms of probability 
theory. The prime number theorem itself can be expressed as: 


Theorem 4.8.6 (The Prime Number Theorem) The probability of randomly choosing 
a prime less than or equal to x is asymptotically given by =. 


Most of the ideas surrounding the use of probabilistic methods are discussed in 
the book Probabilistic Number Theory by Elliott [E]. 

The extension of the Riemann hypothesis to the case of arithmetic progressions is 
called the generalized Riemann hypothesis or the extended Riemann hypothesis. 
This says that the zeros of any Dirichlet L-series also lie along the critical line 
Res = 5: 

Generalized Riemann Hypothesis: For an integer k and any character x mod 
k then the nontrivial zeros of the L-series 


— x(n) 
L(s, x) = 3 ns 
n=1 
all lie along the critical line Re s = ‘. 

We close this chapter with a brief discussion of primes in short intervals [x, x + €] 
where € > 0 is a positive constant. Bertrand’s theorem (Theorem 4.2.5) showed that 
for any real number x > 1 there is always a prime in the interval [x, 2x]. Further the 
proof used the same methods as the proof of Chebyshev’s estimate. As an immediate 
consequence of the prime number theorem we can obtain the following result. We 
leave the proof to the exercises. 


Theorem 4.8.7 For any € > 0 there exists an x9 = xo(€) such that there is always 
a prime in the interval [x, (1 + €)x] for x > xo. Equivalently m(x + y) > 1(x) for 
y=ex. 


The above theorem and its proof has the following interesting interpretation. For 
large x (see the exercises) 
mw(2x) — 1(x) ~ W(x). 


Hence for large x there are as many primes asymptotically between x and 2x as 
there are less than x, despite the fact that by the prime number theorem the density 
of primes tends to thin out. However, it can be shown that 
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2n(x) — 17(2x) > co 


asx > 00. 

The result given in Theorem 4.7.4 has been improved upon in various ways. 
Huxley in 1972 continuing a long line of research in this direction showed that there 
is always a prime in the interval [x, x + x°]ifc > Z for large enough x. The value of 
c has subsequently been improved, the most recent being done by Baker and Harman 
who reduced c to .535 again for large enough x. Further Baker and Harman show 
that 

535 


535) _ 
W(x +x???) is oa 


for large enough x. 

Earlier Erdos, using Selberg’s formula, had proved that for each € > 0 there exists 
a constant c(€) such that in the interval [x, (1 + €)x] there are at least eos primes. 
Finally, we mention the following remarkable result which is a consequence of 


Bertrand’s theorem. We outline a proof in the exercises. 


Theorem 4.8.8 Given any positive integer n, the set of integers {1,2, ...,n} can be 
partitioned into n disjoint pairs so that the sum of each pair is a prime. 


So for example {1, 2, 3, 4,5, 6, 7, 8, 9, 10} can be partitioned into 
{1, 10}, {2, 9}, {3, 4}, {5, 8}, (6, 7}. 


The result is in the same spirit as the Goldbach conjecture which states that any 
even integer is the sum of two primes. 
4.9 Exercises 


4.1 Show that Li(x) = is ad t is asymptotically equal to =~. (Hint: Take the Taylor 
expansion of Li(x).) 


4.2 If p, is the nth prime show that limy_,.o = = 1, 


Recall that the binomial coefficient (i) (see Section 4.2) is defined by 


n\ _ n! 
kk) kin —k)! 
4.3 Prove the following facts about Gk 


(a) (() represents the number of ways of choosing k objects out of n without 
replacement and without order (Lemma 4.2.1). This is equivalent to the number of 
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possible subsets of size k in a finite set with n elements. (Hint: Consider the number 


of ways of choosing k out of n with order—this would be n(n — 1)---(2 —k + 1). 
Then consider how many ways each choice of k objects can be rearranged.) 


(b) (Z) = eu 
(c) (/) + (oo — ‘es? (This is the basis for Pascal’s triangle). 


4.4 Prove the Binomial Theorem: For any real numbers a, b, and natural number 


n we have 
wror=¥ ("oor 


(Hint: Use induction and part (c) of Exercise 4.3.) 


4.5 Prove: For a prime p, (x + y)? =x? + y? mod p. (Hence the beginning 
algebra mistake (x + y)? = x? + y? is true in the field Z,.) 


4.6 If s > 0 the Gamma function is given by 


CO 
T'(s) = { xs le*dx, 
0 
Show that 
(a) '(s + 1) = sI'(s). (Use integration by parts.) 
(b) T(n) = (n — 1)! for anyn > ln eN. 


4.7 (a) Show that [°° e~"'dx = %. (Hint: Let A = [°° e~'dx. Then 


‘ fone) <3 lone) 3 comm tee) iis 
A’ = ( e* dx)( e > dy)= ew T*’dxdy 
0 0 0 Jo 


Now change to polar coordinates. Recall that dxdy = rdrd@). 
(b) Use part (a) to show that rG) = ./T. 


4.8 Recall that Stirling’s Approximation is 
n 
n! & J/2rn(—)". 
e 


We outline a proof of this result. 
(a) From Problem 4.6 Stirling’s approximation is equivalent to 


T(p+1)& pe ?/2rp. 


(b) Write the integral for [(p + 1) as follows 
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00 CO 
rip+ = | wtetdx= | sti 
@ 0 


Now substitute the variable x = p + y./p so that dx = ./pdy. Show then that 


io.) 
Tip +1) =| eP n'p+/PY)—P-VPY, fpdy, 
—VP 


(c) By looking at the Taylor series for In x show that for large p 


2 

y y y 
In(p + ./py) = In p + Ind. + ——) 8 Inp + ——- —. 
VP VP 2p 


(d) By using part (c) and the integral in part (b) show that 
i ieee 
l(ip+)D= ake a e 2 dy 
—J/P 


oe 1,2 —VP 2 
=prerypf eb ay— fe ay). 
—0o 


—0o 


(e) Evaluate the two integrals in part (d) to get Stirling’s approximation. Notice 
that from Exercise 4.4 we have 


(oe) 
and so 
a 1.2 
/ e 2 dx =V2n 
—co 
and 


goes to zero as p goes to infinity. 

4.9 Use the prime number theorem to give an alternative proof that there are 
arbitrarily large gaps in the sequence of primes. (Hint: Suppose that there is a bound 
A so that there is always a prime between x and x + A. Then consider 7(nA) to 
deduce a contradiction.) 


4.10 Show that f(x) ~ g(x) is equivalent to f(x) = g(x)(1 + o(1)). 


4.11 Show that f = o(g) implies f = O(g). 
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4.12 Show that: 

(a) cosx = O(1), 

(b) sinx = o(x), 

(c) x = 0(x“) ifd > 1, 

(d) If P(x) is a polynomial of degree n with leading coefficient a then P(x) ~ ax”. 


4.13 (a) Show thatif f = O(1) and g = O(1) then f + g = O(1) or equivalently 
O(1) + O(1) = O(1) 
(b) Show that O(1) = o(x) 


4.14 Show that "* — 0 as x — oo for any 6 > 0. Equivalently Inx = o0(x°). 
Hence In x goes to se naity more slowly than any positive power of x. 


4.15 Using Bertrand’s theorem show that py41 < 2p, where p, is the nth prime. 


4.16 Prove that for each € > 0 there exists an x9 = xo(e) such that there is always 
a prime in the interval [x, (1 + €)x] for x > xo. (Hint: Consider 7(x + ex) — 1(x) 
and apply the prime number theorem.) 


4.17 Show that 7(2x) — (x) ~ 7(x). Hence, asymptotically there are as many 
primes between x and 2x as are less than x. 


4.18 Prove that 


=u p(n) 


n=1 


where j1(71) is the Mobius function. 


4.19 Prove that the set of rationals of the form i P,q primes} is dense in the set 
of positive reals. Recall that a set S is dense in the reals if given any real number r 
and € > 0 there is ans € S with |r — s| <€. 


4.20 Prove Theorem 4.7.6: Given any positive integer n the set of integers 
{1,2,..., 2n} can be partitioned into n disjoint pairs so that the sum of each pair is a 
prime. (Hint: Use induction and then notice that for n = 2k by Bertrand’s Theorem 
there exists an m with | < m < 2k such that 2k + m is prime.) 


4.21 Prove that the equation n! = m* has no solutions in integers with m, n, k > 1. 
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4.22 Prove that there exists real numbers a, b such that for all n 


n 
nen < WE 2 nbn 
i=1 
with p; the ith prime. 


4.23 Let A(n) be the Van Mangoldt function. Prove that 


YVA® =Inn 


d\n 
or equivalently A = px L. 


4.24 Prove the following orthogonality relations among the trigonometric func- 
tions: 

(a) [7 . cos(mx) cos(nx)dx = Oifm An; = rifm =n £0;= 27ifm =n =0. 

(b) Ve sin(mx) sin(nx)dx = O0ifm An;=7ifm=n 40. 

(c) [™, cos(mx) sin(nx)dx = 0 for all m, n. 


4.22 Use the previous problem to show that if f(x) is a periodic function with 
period 27 and Fourier series 


F=a se cos(——) + b, sin(——~)) 
J = 40 n L n L 


n=1 


then, if f(x) = f@); the coefficients ag, d,, b, must be given by 


1 L 
aj = az |, toes 


ai Oy ee aan eee, 
an = — x) cos(——)dx,n = 1,2,... 
LJ. L 


1 st _ Nntx 
= if J (x) sin(——)dx,n = 1,2,... 
L Jet 1 


4.23 Using the formula for complements 


rwrd —s)= ao 


and the duplication formula 
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1 1—2s 
T(s)P(s + 7) = J/72'-4T (2s). 
Show that the relation 
RY l-s 
a =a cas) 


can be transformed into 


C(s) = 2a} sin( PC s\C(1—s),s £0,1. 


4.24 Prove Theorem 4.6.3: The set of number theoretic functions with addition 
defined pointwise and multiplication given by Dirichlet convolution forms a com- 
mutative ring with an identity. 


4.25 Prove Euler’s identity for the cotangent function 


='25 "= Gen 
TCOUTX) = XS ‘ 
eer x? — n2 


4.26 Prove that the Taylor expansion of the logarithm of the Gamma function at 
z = Ois given by 
CK) 


[o.@) 
log —2) = y+ 2 


k=2 


where + is the Euler-Mascheroni constant 


; n 1 
y= im (> : a F 


k=1 


Chapter 5 
Primality Testing—An Overview 


5.1 Primality Testing and Factorization 


In the previous two chapters we have seen that there are infinitely many primes and 
showed that as we move through larger and larger integers, the density of primes 
thins out. In particular we proved that 


where 7 (x) represents the number of primes less than the positive real number x. This 
result, the prime number theorem, could be interpreted as saying that the probability 
of randomly choosing a prime number less than or equal to a positive real number 
x 1s approximately + as x gets large. In this chapter we consider the question of 
determining whether a particular given positive integer n is prime or not prime. 
The methods concerning this problem are called primality testing and consist of 
algorithms to determine whether or not an inputted positive integer is prime. Primality 
testing has become extremely important and has been of great interest in recent years 
due to its close ties to cryptography and especially public key cryptography. 
Cryptography is the science of encoding and decoding secret messages. Many of the 
most powerful and secure encoding methods depend on number theory, especially 
on the computational difficulty of factoring large integers. It turns out, somewhat 
surprisingly, that relative to ease of computation, determining if a number is prime 
is easier than actually factoring it. 

Public key cryptography is that part of cryptography that deals with sending secret 
(or hopefully secure) messages across public communications systems. The major 
algorithm in this area, called the RSA algorithm, depends directly on the difficulty 
of factoring large integers. We will briefly introduce cryptography and the RSA 
algorithm in Section 5.4. First we take a short overview look at primality testing. 

At first glance, the problem of determining if a positive integer 1 is prime, seems 
like an easy one. If 7 is not prime it must have a divisor m with 1 < m <n. Therefore 
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test all integers 2,..., 5 to see if they divide n or not. If there is such a divisor then 
n is composite. If not, then n is prime. 

Of course this can be improved in several ways. First of all, if n = mk then one 
of m, k must be < ./n. Hence we need only check integers from 2 to ./n rather than 
from 2 to 5. Further if m has a divisor m with 1 < m < ./n then n must have a prime 
divisor p with 1 < p < \/n. Therefore it is only necessary to check the primes < /n. 
Therefore knowing all the primes < ./n allows to test for primality all the integers 
<n. We summarize all these comments to give a general algorithm for primality 
testing. 

General Algorithm for Primality Testing: Given n > 0, test all primes p with 
p < Jn. The integer n is prime if and only if none of these primes divides n. 

EXAMPLE 5.1.1 Test whether the integer 83 is prime. 

Now 9 < ./83 < 10 so we must test all the primes less than 9. Hence we must 
test 2, 3,5, 7. None of these divides 83 and therefore 83 is prime. 

This general algorithm is simple and always works. However, it becomes compu- 
tationally infeasible for large integers. Therefore other methods become necessary 
to determine primality. Most of these methods rely on a number theoretic property, 
such as Fermat’s theorem, which is true for all primes but may not be true for all 
composites. Recall that Fermat’s theorem (see Chapter 2) says that a’~' = 1 mod p 
for any prime p and for any a with 1 < a < p. We will return to this in Section 5.3. 
In the next section we examine a series of techniques for determining primes called 
sieving methods. 


5.2 Sieving Methods 


In ordinary language a sieve is a device to separate or sift finer particles from coarser 
particles. This idea has been applied to number theory via numerical sieving meth- 
ods. A sieve in number theory is a method or procedure to find numbers with desired 
properties (for example, primes) by sifting through all the positive integers up to a cer- 
tain bound, successively eliminating invalid candidates until only numbers with the 
particular attributes desired are left. Sieving methods are quite effective for obtain- 
ing lists of primes (and numbers with other characteristics) up to a reasonably small 
limit. 

Relative to generating lists of primes, sieving methods originated with the Sieve 
of Eratosthenes. This is a straightforward method to obtain all the primes less than 
or equal to a fixed bound x. It is ascribed (as the name suggests) to Eratosthenes 
(276-194 B.C.) who was the chief librarian of the great ancient library in Alexandria. 
Besides the sieve method he was an influential scientist and scholar in the ancient 
world, developing a chronology of ancient history (up to that point) and helping 
to obtain an accurate measure (within the measurement errors of his time) of the 
dimensions of the Earth. 

The method of the Sieve of Eratosthenes is direct and works as follows. Given 
x > 0 list all the positive integers less than or equal to x. Starting with 2, which is 
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prime, cross out all proper multiples of 2 on the list. The next number on the list, 
not crossed out, which is 3, is prime. Now cross out all the proper multiples of 3 
not already eliminated. The next number left uneliminated, 5, is prime. Continue in 
this manner. As explained for the primality test described in the previous section the 
elimination must only be done for numbers < ./x. Upon completion of this process, 
any number, except 1, not crossed out must be a prime. 

Below we exhibit the Sieve of Eratosthenes for numbers < 100. In beginning 
each round of elimination we must only consider numbers < /100 = 10. 


LD: 3-3 B18 FO 
11 12 13 1415 16 17 1819 20 
WA 22 23 24 25 6 27229 30 
31 32 33 24 35 36 37 38 39 AO 
4] AZ 43 AE AS AG AT AB AT 50 
AT SZ 53 54 55 36 57 38 59 60 
61 62 63 64 65 66 67 68 69 70 
71 F273 FA 16 F179 80 
SI 82 83 84 85 86 87 88 89 90 
HM 92 93 94 95 96 97 98 9F 100 


After completing the sieving operation we obtain the list 
{2, 3,5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61} 
{67, 71, 73, 79, 83, 89, 97} 


which comprises all the primes less than or equal to 100. 

Given positive integers m, x, by a slight modification, the Sieve of Eratosthenes 
can be used to determine all the positive integers relatively prime to m and less than 
or equal to x. 

Here suppose we are given m and x. Let pj, ..., px be the distinct prime factors of 
m arranged in ascending order, that is, p} < po <--- < px. Next list all the positive 
integers less than or equal to x as we did for the ordinary sieve. Start with p; and 
eliminate all multiples of p; on the list. Then successively do the same for p2 through 
px. The numbers remaining on the list are precisely those relatively prime to m that 
are also less than or equal to x. If p; > x ignore this prime and all higher primes. 

Below we exhibit the sieve applied to finding the numbers less than 50 and rela- 
tively prime to 180. 

Since 180 = 27375 we must sieve out multiples of 2, 3 and 5. 
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122 A458 678 ¥W 
11 2 13 14 15 16 17 18 19 20 
JA 22 23 24 25 26 27 28 29 36 
31 32 33 34 35 36 37 38 39 AO 
41 AZ 43 Ad AS AG 47 AB 49 50 


The remaining list is 
{1,7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49}. 


These are all relatively prime to 180. Recall that these numbers then are all units 
modulo 180. 

Legendre in 1808, in an attempt to determine the distribution, 7(x), of primes, 
derived a computational formula for the Sieve of Eratosthenes. Recall (see Chapter 4) 
that Legendre had conjectured the prime number theorem in the form 


x 


Inx — 1.08" 


W(x) & 


We first present a slightly more general form of Legendre’s formula. Given a 
positive integer m and a positive x let 


N»(x) = number of positive integers < x and relatively prime to m. 
This is precisely the size of the list obtained in the modified Sieve of Eratosthenes 


derived above. We obtain: 


Theorem 5.2.1 (Legendre’s Formula for the Sieve of Eratosthenes) Letm € N, x > 
0, then 


Nn) = >) wd] 


d\m 


where u(d) is the Mobius function and [ | is the greatest integer function. 
Proof If m = 1 then clearly 
N\ (x) = [x]. 


Now given m > | let pj < po <--- < p, be the distinct prime factors of m and for 
each j with | <j < k let mj; =p, - p2--- pj. 

For a given m; the only integers counted by Njn,(x) not counted by Nmn,,, (x) are 
those of the form pj.;n < x where (n, m;) = 1. It then follows that 


xX 
Nin; &) _ Nin. (x) = Nin, (—). 
Pj+1 


Applying this repeatedly we obtain 
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Nom (8) = Ni (x) MCC) = ba eS 


x 
Nin, (X) = Nin, (*) — Nm, (—) = [x] — [ 

p2 
Continuing in this manner inductively we arrive at 


Nn) = (D5) (5.2.1) 


d\m 


where m = pip2--- px and w(d) is the number of distinct prime factors of d. The 
integer m is called the square-free kernel of m. This can then be expressed in terms 
of the Mobius function. Recall (see Chapter 2 and Section 3.6) that the Mobius 
function is defined by 


(d) = (—1)° _ if d is squarefree 
~ 10 otherwise 


Substituting this in the form of Legendre’s formula (5.2.1) and realizing that 
L(d) = 0 except for the factors of the square-free kernel we obtain 


Nm) = Y) w@)(5] (5.2.2) 


d\|m 


proving the theorem. 


Now let x > 2 and let 


m=||p 


PSJx 


where p is prime. Then N,,(x) counts the number of primes in the interval [./x, x]. 
It follows that 


Nm (x) = 1x) — w/x) +1. 


Substituting Legendre’s formula (5.2.2) into this expression we obtain as a corollary: 
Corollary 5.2.1 For x > 2 


ma) =—-1+m/e)+ SY w@(5] 
v(d)sVJx 


where v(d) is the greatest prime factor of d. 


Although this gives a formula for 77(x), it is essentially useless in truly computing 
a(x) for large x, or in shedding any light on the prime number theorem. First of all 
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if we estimate [7] by | + O(1) and substitute in the formula we have 


ma)— mye +1= >) was +00) 
v(d)<Vx 


1 
=x ][aq--)+0e") 
P 

SVX 


Hence the error term is exponentially larger than the main term. Further the number 
of steps in the Sieve of Eratosthenes and hence in the computation of the formula is 


proportional to >- ee a However, it can be shown that 


x 
>: = xInInx + O(x) 


psx 


(see [CP] page 113 and [HW] Theorem 427). Therefore the number of steps is 
proportional to In In x which goes to infinity (albeit slowly) with x. In addition, from 
a computer/computational point of view, one of the major computational drawbacks 
to implementing the Sieve of Eratosthenes (for large x) is the computer space it 
requires (see [CP]) which can be substantial. We mention that Brun attempted to 
make Legendre’s formula computable. As an application he was able to prove the 
spectacular result that the sum of the reciprocals of the twin primes 


P.p+2 primes 


converges. We will look at Brun’s method and his proof of this result in the next 
section. We note that further slight modification of the Sieve of Eratosthenes can be 
utilized to obtain a complete prime factorization of a positive integer n. 

Meisel in 1870 also gave an improvement to Legendre’s formula and was able to 
use this technique to compute (x) correctly up to x = 10°. 


Theorem 5.2.2 (Meisel’s Formula) Let py < pz < +++ < Pn <-+-+- be the listing of 
the primes in increasing order so that p; is the jth prime. Let x > 4, n = 1(./x) and 
My = P1-+* Pn. Then 


T(x) = Nin, () + ml + S) + sits 1) 1 dx es ) 


1 
where m = 1 (x3) ands =n—m. 


Proof From the proof of Legendre’s formula we have 


xX 
Nin; &) = Ning (x) = Nin; (—). 
Pj+l 
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This holds for 1 < j < n. Summing this equality for 7 = m+ 1,..., we obtain 


5 Xx 
Nm, (x) = Nim (x) = are ae. 


j=1 m+j 


The inequalities 
2 
< x3 


L L 
2a Sa? = 
Pmtj 


holding for j = 1, 2,..., 5, then imply that 


Nim, (%) = 1+ w(x) — (Vx) = 1x) —n +1 


and 
x x x ¢ 
Noringj1(———) = 1+ 1(——) — 8 (Pmj-1) = TC )-—(m+j-2). 
Pm+j Pmtj Pmtj 
Therefore 
Ss fe 
W(x) = Nm, (x) +1 — 1 = Ning (x) — > (r(——) —m—j+2)+n-1 
j=l Pm4j 
7 x s(s—1 
= Ning (x) — >) r(——) +m. + 5) + le ee 
= Pm4+j 2 


proving the theorem. 


Note that N,,(7) is the total number of integers less than n and relatively prime to 
n. Hence 


Nn(n) = on) 


the Euler phi function introduced in Chapter 2. Applying Legendre’s formula with 
m =n = x we obtain 


a(n) = Sd) =n] Ja - -) 


d|n p\n 


This recovers the formulas given for ¢(n) in Theorems 2.4.7 and 2.4.8. 
A variation of Legendre’s formula can be obtained in the following manner. Sup- 
pose 
py 2 Sess Spy ae 


are the primes listed in increasing order. Let x > 2 and 
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@(x, k) 
be the number of positive integers < x not divisible by the first k primes. Hence 
(x, k) = Nn (x) 


if the square-free kernel of m is p; --- px. The same counting arguments applied to 
this function lead us to the next result. 


Theorem 5.2.3 Let the function ® be defined as above. Then 


em =E1- DEI + >YI——I+-- 


iPj PiPjPk 


where each sum is over the set of pairwise distinct primes less than or equal to x. 


Here ®(x, x) = Ny(x) so 


®(x, x) = a(x) — r(x) +1 


W- SI+ VM t1- DY t+. 


i PiPj PiPjPk 
DiSJ/X pi<pisvx pi<pj<pesJx 


This version of Legendre’s formula satisfies a very nice recurrence relation. 


Corollary 5.2.2 Let the function ® be defined as above. Then 


@(x,k) = O(x,k — 1) — O(=, k— 1). 
Pk 


There is a very nice visual quadratic sieve which also generates the prime numbers. 
Consider the parabola x = y* and consider the points (n”, n) lying on the parabola 
for n = 2,3,... Now connect all pairs of such points lying on the two branches of 
the parabola, above and below the x-axis by straight line segments. The intersection 
points of these lines with the positive x-axis corresponds to composite numbers. The 
integer points remaining are precisely the primes (see the exercises). In Figure 5.1 
we give the picture of this. 


5.2.1 Brun’s Sieve and Brun’s Theorem 


The Sieve of Eratosthenes and the extensions of it described in the last section are 
really just the tip of the iceberg as far as sieving methods in number theory are 
concerned (see [HR]). In this section we give one beautiful application by V. Brun 
of a refinement of Legendre’s formula for the Sieve of Eratosthenes. 


5.2 Sieving Methods 227 


Fig. 5.1 Brun’s sieve 


Recall that the twin primes are the set {(p, p + 2)} where both p and p + 2 
are primes. There are two related still open questions concerning this set. Both are 
called the twin primes conjecture. The first is that there are infinitely many twin 
primes. Empirical evidence and a probabilistic argument suggests that there are 
infinitely many such pairs and most people working in the area feel that this part of 
the conjecture is almost certainly true. However, it remains still open. The second 
twin prime conjecture deals with the density of the twin primes and is in the same 
spirit as the prime number theorem. 

If we let 


T(x) = the number of pairs of twin primes (p, p + 2) with p < x 


then the second twin prime conjecture or strong twin prime conjecture is that 


* dt 
T(x) ~ a dng” 


The constant C is called the twin primes constant and is given by 
C = 2T12 


where 


1 


p>2,p prime 


Sometimes IT2 is also called the twin primes constant. The value of Iz has been 
computed to a great many decimal places and has the approximate value 


Tl, © .660161815.... 


Brun proved that there exists an integer NV such that 


100x 
(In x)2 


12(x) < forx > N. 
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It has further been proved that 


<i x 1 Gone 
T(x) < 2a! + O( nee 


) 


where k is a constant. Hardy and Littlewood proposed the value of 2 in the strong 
twin primes conjecture. 

The strong twin primes conjecture is actually the smallest case of a general con- 
jecture called the Hardy—Littlewood conjecture or k-tuple conjecture. 

Here suppose 0 < m, < m2 < --- < my are k odd integers. Then a prime 
constellation is a set {p, p + 2m, p + 2m, ...,p + 2m x} where all are primes. If 
we let 


Tiny ,...,M (x ) 


denote the number of prime constellations (relative to a fixed set {77 , ..., mx}) less 
than or equal to x then the k-tuple conjecture or Hardy—Littlewood conjecture is 
that 


where C(m, ..., 7) 1s a constant depending only on mj, ..., mz. The strong twin 
primes conjecture is the special case of this with m, = | andk = 1. 

Although these conjectures are still open, V. Brun in 1920 was able to prove the 
amazing result that the sum of the reciprocals of the twin primes converges. We call 
this amazing since this result can be accomplished without even knowing if there are 
infinitely many twin primes. Brun’s theorem is the following: 


Theorem 5.2.4 (Brun) If S = {(p, p + 2)} denotes the set of twin prime pairs then 
the series 


See 
(p,pt2)eS P Pt 2 
converges. That is, 
1 1 
+it+sat 7 + 7 + B + 


converges. 


Of course if there are only finitely many twin prime pairs the series would trivially 
converge. 
The value of the series 


1 1 
p= > d+) 
(p.pt+2)eS BS 


is called Brun’s constant. A great deal of work has gone into determining the exact 
value of B. Empirically the value of B has been computed as (see [CP]) 
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B & 1.902160583104.... 


Brun’s theorem has been extended to further pairs of primes separated by a con- 
stant d > 2. For example, if d = 4 the pairs of primes of the form (p, p + 4) are 
called cousin primes. Again it is open whether there are infinitely many of these 
(for each d or for any fixed d) but Segal [S] proved that for any given d the sums of 
the reciprocals of the pairs is also convergent. 

In 2014 Y. Zhang [Zh] proved that there is a positive constant with the property that 
infinitely many pairs of primes differ by less than that constant. In 2015 J. Maynard 
[Ma] gave a numerical extension. 

Brun’s proof of Theorem 5.2.4 is technical and involves attempting to improve 
computationally on Legendre’s formula for the Sieve of Eratosthenes. His proof 
depends on the following technical results. After giving the proof of Brun’s theorem 
we will give the proofs of the lemmas. 


Lemma 5.2.1 [fn > 0andm > 0 then 


= if” m n= 
Lco'(=co ( a ) 


In particular if m is odd then 


m—1 [n 
>en'()) > 0. 
i=0 : 


The next lemma depends on symmetric polynomials and symmetric functions. 
In Chapter 6 we will look at these in detail here we just introduce what is needed for 
the next result. 

Suppose yj,...,¥, are n distinct real numbers. (Later we will look at a more 
general situation). Form the polynomial 


D(x, y1, -e +3 Yn) = (x—y1)-- -(x — yy). 
The ith elementary symmetric polynomial or ith elementary symmetric function 
s; iny,,...,¥n fori = 1,...,n, is (—1)/a;, where a; is the coefficient of x"~! in 
P(X, Y1, oes Yn): 


To be more specific consider y,, y2, y3. Then 


P(X, V1, Y21 ¥3) = (% — y1)(& — ya)(x — ys) 
= x3 — (y, + yo + y3)x? + Ciy2 + yiy3 + yoys)x — yiy2y3- 


Therefore, the three elementary symmetric polynomials in y;, y2, y3 are 


1. 5) =yi ty2 + y3. 
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2. S82 =yiy2 + yiy3 + y2y3. 
3. 53 y1y293- 


In general, the pattern of the last example holds for y;, ..., y,. That is, 
Sy = yy tyo2t-++ +n 
S2 = yiy2 + Yiy3 +++ + Yn-1Yn 


S3 = Yiy2y3 + yiyoya +++ + Yn-2Yn-1n 


Sn = Y1°°°Yn- 
We now state the lemma we need. 


Lemma 5.2.2 /fS,, is the nth elementary symmetric function of s positive numbers 
a,...,ds, 1<n<-ss, then 


Lemma 5.2.3 Let d > 0,n > 0. Then the number of positive integers m < n which 
belong to any given residue class mod d differs from * by less than 1. 


The following is the crucial lemma. 


Lemma 5.2.4 Let P(x) denote the number of primes p < x for which p + 2 is prime. 
Then for x > 3 we have 


P(x) < c——~(inInx)? 

x c—— x 
(In x)2 

where c is a constant. 


We can now give a proof of Brun’s theorem. 


Proof (Theorem 5.2.4) As in the statement of Lemma 5.2.4 let P(x) denote the 
number of primes p < x for which p + 2 is prime. It follows then from Lemma 5.2.4 
that for x > 3 (see the exercises) 


x 
P(x) <k 


(In x)? 


where k is a constant. Let (p,, p- + 2) denote the rth twin prime pair. Then for all 
r > 1 we have 
Pr Pr 


r=P(p,) <k 7 <k 5 
(In p,)? (In(r + 1))2 
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1 k 
< 7° 
Pr r(in(r +.1))? 


Now it follows easily from the integral test for infinite series (see the exercises) that 


the series 
[oe] 
=— 
 r(In(r + 1)? 


converges. Therefore by the comparison test 


Sey Ce, 
Pr 4 Pr Pr +2 


converges. 


We now give the proofs of the four technical lemmas. The first three are very 
straightforward. The real difficulty lies in Lemma 5.2.4. 


Proof (Lemma 5.2.1) We wish to prove that if n,m > 0 then 


m fn n— 1 
So ()-cor() 
i=0 L m 


The second assertion, that if m is odd then 
m—1 
n 
x ( ' > 0, 
ar 


follows directly from the first. 
We prove the first assertion by induction on m. If m = 0 then 


an if{™\_ -_y0f"™\) _ y _ of(27—-1\_ 
2 vi(") =¢ 1) (() <1 »°( A Ve 


so it is true for m = 0. Suppose that 


~ if” m ae 
Sco'(=cn ( - ) 


Then 
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m+1 7 ve 7 m : 7 
Lv (") = (-1) (,. ) +2) (‘) 


— ¢_1ym+1 n a m(a-1 
= (-1) (7 econ") 


ee Ce 1 : 
=(-l) ded (see the exercises). 
m 


Therefore the first statement is true by induction. 


Proof (Lemma 5.2.2) Here we wish to show that 


where S,, is the nth elementary symmetric function of s positive numbers a), ..., ds, 
1 <n < s. Notice that S, consists of the sum of all n-fold products taken from 
a\,..., ds. Now consider 

Si = (+--+ +s)". 


There are (*) n-fold products aj, ---a;, in the binomial expansion and each has 


n 


coefficient n!. Hence the result follows. 


Proof (Lemma 5.2.3) Letd > 0, > 0. We wish to show that the number of positive 
integers m < n which belong to any given residue class mod d differs from 5 by less 
than 1. 

On each set of d consecutive integers there is only one number counted for a given 
residue class mod d. Up to a given positive n there are [7] complete sets of residues 
mod d and, if 7 is not integral, an additional partial set of residues. Hence the number 
counted in the: statement of the lemma is either [7] or possibly [5] + 1 depending on 
whether 7 is integral or not. Therefore the nithiber m in the lemma always satisfies 


at ESA 
= <mM< — 
d d 


Proof (Lemma 5.2.4) Let P(x) denote the number of primes p < x for which p + 2 
is prime. Then we wish to show that for x > 3 


P(x) < c—— (In Inx)* 


aay 


where c is a constant. First, suppose that x > 5 and y is chosen so that 5 < y < x. 
Let Q(x) be the number of integers n in the interval y < n < x for which both n and 
n-+ 2 are primes. Clearly then 
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P(x) < y+ Q(). (5.2.3) 


Let p, < po <-+-: < Py <--- denote the sequence of primes and suppose that 
m(y) = r. Let A(x) denote the number of integers n for which 0 < n < x andn is 
not congruent to either 0 or —2 mod p; fori = 2,..., r. Then 


Q(x) < A(@) (5.2.4) 


for every n, counted in Q(x), is greater than y and therefore greater than p, for h < r 
since 7(y) = r. Combining (5.2.3) and (5.2.4) we get 


P(x) <y+A(q). 


Let Q(d) denote the number of distinct primes factors of d > 0. If d is odd and 
square-free let B(d, x) be the number of positive integers n < x for which for every 
prime factor p of d either n = 0 mod p or n = —2 mod p. From Lemma 5.2.3 we 
have 

\B(d, x) — ee < 220) (5.2.5) 


for if 0 <n < x thenn belongs to 2°™ residue classes mod d. (Two classes for each 
of the Q(d) prime factors of d = [pa p-) 
We next claim that 


A(x) < >» L(d)B(d, x) (5.2.6) 


d|p2--p,,2(d)<m 


where m is an arbitrary positive odd integer. 

Every n with 0 < n < x which is not counted in A(x) satisfies n = 0 mod p;, or 
n = —2 mod p;, for b primes p;,,..., p;, with 2 < t) < --- < ft) < r. Hence those 
n not counted in A(x) are counted in the sum precisely for those terms B(d, x) for 
which d|(p2---p,) and d|(p;, - ++ Py,) and further Q(d) < m. 

Since p2---p, is square-free it follows that every n with 0 < n < x which is 
counted in A(x) is counted exactly once in the sum since ~(d) = 0 unless d = 1 or 
d is squarefree. Combining these two observations we get that the complete count in 
the sum is then 


m—1 
~~ e@Ba,x) = Yen'(") >0 
i=1 


d|p2--p;,2(d)<m 


by Lemmas 5.2.1 and 5.2.2. Hence the inequality (5.2.6) is proved. 
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Combining this inequality with inequality (5.2.5) we have 


Qd@ mt op. 
Aw) <x > nO +32" ') 
i=0 


d\py-Pr 
Q(d)<m 


First we have 


since 
r-—1 3 ES Derry. F 
i) i! - 


But this last sum satisfies 


m—1 


Qm 25 < o _ < 2m, 2 (2y)” 


sinceer -—1>2,r<y. 
For the second part of the sum 


a1 Q(d) 
> Se -> a 3 > a 
Bayem d\pa“Pr — mele 


If m > r the last term is zero. We have by Euler expansion 


Q(d) 
a i = |] a-3-Se 1)°2” ee as 


d\p2--Pr, 2<p<p, n=m d\p2-“Pr, 
Qd)<m Q(dy=n 


= [[a- “ - ae 12" 5, 
2<p<n n=m 


where S,, is the nth elementary symmetric polynomial in 


From Lemma 5.2.2 and since n!e” > n” (see the exercises) it follows that 


Ss" (eS;)" —3cIn1 
St A oe 


n\ n n 


5.2 Sieving Methods 235 


where c is a constant. Then 


xe y'2"s,| < yee penn In ny 


with c, another constant. It follows that if 
m > 2c; InIny 


then 


xe 1)"2"Sn | Ss ye On ~ 9m-1 


n=m 


Combining this with the earlier inequalities we obtain 


p(d)22 (op) 1 
ie I< Gay? * oa 


d\p2-Pr. 
Qd@)<m 


with cy another constant. Therefore 


P(x) <y+ ——, + (2y)”. 


(in et Qm— >m—1 
These inequalities are true if 5 < y < x and m > 2c, InIny. If we choose 
y =x" and m = 2[c; InInx] — 1 


then these conditions are met and so the derived inequalities hold. Therefore 


P(x) < cat 4 (2y)2e1 nin) 


XxX + Xx 
(In y)? 221 InInx 


for x > cs with cs some large enough constant. 
Each of the terms in the parenthesis satisfies 


<C¢ (in In x)? 


Tas 


for some constant cg holding for all of them. To see this we have first 
y < k,./x for some constant ky. 


Further 
x 


2 
day < ae dna? (kz In In x) 
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and 
x x x 


Patninx ~ Gpx m2 ~ nx)? 


since cy > 2 and 21n2 > 1. Finally 


Inx 


oe Ink In2 2 
(2y)1 InInx __ 2¢1 InInx(stpmx +n 2) < es mrte InInx 


3 3 
< cje!"™* = 7x4 for some constant c7. 


Therefore for x > cs, large enough, we have 


P(x) < c6——— (In Inx)* 


ina 


To obtain the result for x > 3 we combine the first terms into a new constant C and 
get that for x > 3, 


P(x) < C—— (nInxy* 
(In . 


proving the lemma. 


5.3. Primality Testing and Prime Records 


As we have seen in the previous two sections it is theoretically very straightforward, 
using either the direct method of trial division or the Sieve of Eratosthenes, to test an 
integer for primality. The problem is that for large integers n these methods become 
computationally intractable if not almost impossible. Hence direct trial division and 
the Sieve of Eratosthenes can only be used for small integers and therefore for large 
integers other methods must be employed. We should note before going further that 
the concepts of small and large are very relative in number theory to the type of 
computing machinery one is using. Numbers as large as 10, 000, 000, 000 can be 
tested very easily, even on small computers, using the Sieve of Eratosthenes. In 
terms of computational asymptotic number theory, 10° is still small. Similarly, for 
human computation the total number of atoms in the universes is massive. This 
number is estimated as being of the order of 10’. However, 79 digit integers are 
only considered moderate in asymptotic computational number theory which may 
want to handle integers with hundreds or even thousands of digits. Therefore what 
is needed are tests for testing primality which will handle some of these gigantic 
integers. 

A primality test is then an algorithm which inputs a positive integer n and outputs 
whether it is prime or not. These tests can be subclassified as either deterministic 
primality tests or probabilistic primality tests. In a deterministic test an integer n 
is inputted and the output is, yes the integer is prime, or, no, the integer is not prime. 
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Hence both the direct method of trial division, and the Sieve of Eratosthenes, are 
deterministic tests. 

A nondeterministic primality test takes an inputted integer n and returns either 
no, it is not prime, or it may be a prime. A probabilistic primality test is a nonde- 
terministic test that returns either the inputted integer is not a prime or is probably a 
prime to some given degree of accuracy. There are various tests (that we will look at 
in the next section) which can give this accuracy to as high a probability as desired. 
Numbers that pass a probabilistic primality test are called probable primes. For use 
in cryptography, knowing if an integer is prime to a high probability, is often just as 
good as knowing if it is definitely prime. For this reason probable primes with a high 
degree of probability are called industrial grade primes, a term originally coined 
by M. Cohen. 

The majority of nondeterminsitic tests are based on either Fermat’s theorem or 
some variation of it. Recall from Chapter 2, Fermat’s (Little) theorem, (Corollary 
2.4.2): 


Theorem 5.3.1 (Fermat’s Theorem) If p is a prime and p ¢ a then 
a?-' = 1 mod p. 


This was a special case of the more general Euler’s theorem, which we will also 
need. 


Theorem 5.3.2 (Euler’s Theorem) If (a,n) = 1 then 


a®” = 1 mod n. 

Hence if n is an integer and a is relatively prime to n with a’—! not congruent to 
1 mod n then n cannot be prime. This is usually called the Fermat Probable Prime 
Test and was introduced briefly in Chapter 2. Basically given n we find an a with 
(a, n) = 1 and compute a’~! mod n. If this value is not 1 mod n then n is not prime. If 
it is congruent to 1 mod n then n may be prime. In the latter case, by trying different 
values for a we can assign a probability value. We will make this precise in the next 
section. For now we will state the basic Fermat Probable Prime Test and present an 
example. 

The Fermat Probable Prime Test: Suppose n is an inputted integer. Find an a 
with (a,n) = 1. Compute a"~! mod n. If this value is not 1 mod n then n is not prime. 
Tf this value is 1 mod n then n may be prime. 

EXAMPLE 5.3.1 Test whether 11387 is prime. 

This integer is relatively small so even by trial division determining whether it is 
prime is easy. We use the Fermat method just to illustrate the technique. 

Start with a = 2 and we test 2''*8* mod 11387. The basic idea is to use repeated 
squarings to reduce the congruence. All the equivalences are modulo 11387. 


2'3 — 8192 = —3195 => 276 = 10208025 = 5273 
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=> 2° = 8862 = 2525 => 2! = 10292 = —1095 


=> 2708 = 3300 => 241° = 2617 => 2% = 5102 
Continuing in this manner we eventually get 
21988 = 8642 => 2187 = 4321. 


From Fermat’s theorem, if n is prime we would have a"! = 1 modn and therefore 
a” = amod n. Here 4321 is not congruent to 2 mod 11387. Therefore 11387 is not 
prime. 

For this integer using trial division it is easy to obtain the factorization 


11387 = (59)(193). 


However even with an integer this size at least a calculator is necessary. 
In 1891 Lucas gave the following extension of Fermat’s theorem which actually 
makes the Fermat Test deterministic. 


Theorem 5.3.3 (Lucas) Let n > 1. If for every prime factor p of n — | there exists 
an integer a such that 


1. a = | modnand 
2. a? #1modn 


then n is prime. 


Proof Suppose n satisfies the conditions of the theorem. To show that n is prime 
we will show that ¢(7) = n — 1 where ¢ is the Euler phi function. Since in general 
o(n) <n — 1, to show equality we will show that under the above conditions n — 1 
divides ¢(n). Suppose not. Then there exists a prime p such that p’ divides n — 1, 
but p” does not divide ¢(n) for some exponent r > 1. For this prime p, there exists 
an integer a satisfying the conditions of the theorem. Let m be the order of a modulo 
n. Then m divides n — | since the order of an element divides any power which 
equals 1 (see Chapter 2). However by the second condition in the theorem and for 
the same reason, m does not divide — Therefore p’ divides m which divides o(n) 
contradicting our assumption. Hence n — 1 = ¢(n) and therefore n is prime. 


Although this Lucas test is deterministic, it is, in most cases, no more computa- 
tionally feasible than trial division or sieving since it depends on the factorization of 
n— 1. In general, factorization is even more difficult than solely testing for primality. 
Therefore even here further methods are necessary. We note that the idea in the Lucas 
test has been quite effective in developing methods for testing Fermat and Mersenne 
numbers for primality. We will return to these in Section 5.3.2. 

The majority of probabilistic primality tests are based on the Fermat test or some 
variation of it. The basic idea is that if an integer passes the test for a base b (so that its 
a probable prime) then try another base. Doing this there is then a technique to attach 
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a probability tied to the number of bases attempted. We will make this precise in the 
next section. For now we would like to look at a new (2003) deterministic algorithm 
which answered a major open problem in both number theory and computer science. 

Primality testing is essentially a computational problem. Therefore a primality 
test raises questions about the accompanying algorithm’s computational speed and 
computational complexity. For these types of number theoretic algorithms the com- 
putational complexity is measured in terms of functions of the input length, which 
here is roughly the number of digits of the inputted integer. The Sieve of Eratosthenes 
requires, for an inputted integer n, roughly the same order n of operations. If n has 
log,)n digits then the Sieve requires O(10'°0”) operations to prove primality. We 
say that this algorithm is of exponential time in terms of the input length. The big 
open question was whether there exists a deterministic algorithm which is of poly- 
nomial time in the input length. This means that for this algorithm there is a positive 
integer d such that the number of operations in the algorithm to prove primality is 
O((inn)4). Earlier, Miller, and Rabin had shown that the Miller-Rabin test, which 
we will describe in the next section, can be made deterministic. Further it is of poly- 
nomial time if one accepts as true the extended Riemann hypothesis (see Chapter 4). 
However prior to 2003 it was an open question whether there was a deterministic 
algorithm for primality which could be shown to be of polynomial time without using 
any unproved conjectures. 

In 2003, M. Agrawal and two of his students, N. Kayal and N. Saxena, developed 
an algorithm, now called the AKS Algorithm, which was deterministic and could be 
proved to be of polynomial time. The result was even more spectacular since it was 
accomplished with relatively elementary methods. The basic algorithm depends on 
two rather straightforward extensions of Fermat’s theorem. This result has of course 
generated a great deal of attention and much has already been written about it. We 
refer the reader to the articles [Bo] and [Be] for a more complete discussion of the 
algorithm and its development. Because of the timeliness and excitement this result 
has generated we will present the basic arguments in the paper of [AKS]. This will 
be done in Section 5.5 at the conclusion of this chapter. 

The first result needed is the following which was well known in the theory of 
finite fields. 


Theorem 5.3.4 Suppose (a,n) = 1 withn > 1. Then nis a prime if and only if 
(x — a)" =x" —amodn 
in the ring of polynomials Z[x]. 


Proof Suppose n is prime. If n = 2 the statement holds. Now we assume that n is an 
odd prime. From the binomial theorem 


n 


(x-—a)"= > ({)ea 


k=0 
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If n is prime and k ¥ 0, n then (7) = 0 mod n (see the exercises). Therefore 
(x—a)" =x" —a" inZ,[x]. 
But from Fermat’s theorem a” = a mod n and so the result follows. 
Conversely, if n is composite then it has a prime divisor p. Suppose p* is the 
highest power of p dividing n. Then p* does not divide () . Therefore in the binomial 


expansion of (x — a)” the coefficient of the x” term is not zero mod n and hence 


(x—a)" 4x" —ainZ,[x]. 


This theorem is computationally just as difficult to use as Fermat’s theorem in 
proving primality. Agrawal, Kayal, and Saxena then proved the following extension 
of the above result which leads to the AKS algorithm. To state the theorem we need 
the following notation. If p(x), g(x) are integral polynomials then we say 


D(x) = q(x) mod (x’ — 1,n) 


if the remainders of p(x) and q(x) after division by x” — | are equal (equal coefficients) 
modulo n. 


Theorem 5.3.5 (AKS) Suppose that n is a natural number and s < n. Suppose that 


q,¥ are primes satisfying q\(r — 1), n@ is not congruent to 0 or 1 modulo r and 
(1) > pAlv", If for alla with | <a<s 


I. (a,n) = 1, and 
2. (x —a)" = x" —amod (x" — 1,n). 


Then n is a prime power. 


The proof of this theorem is not difficult but requires some results from the theory 
of cyclotomic fields which are outside the scope of this book. Hence at this point we 
omit the proof. However as mentioned, the basic arguments in the paper of [AKS] 
will be presented in Section 5.5. The most difficult part of the proof is showing that 
given n there do exist primes gq, r satisfying the conditions in the theorem. 

From Theorem 5.3.4 we get the following algorithm (the AKS algorithm), which 
is deterministic. 


The AKS Algorithm: Input an integer > 1 
Step (1): Determine if n = a’ for some integers a, b. If so and b > 1 output 
composite and done. 
Step (2): Choose q, r, s satisfying the hypotheses of Theorem 5.3.5 
Step (3): Fora = 1,2,...,s5 — 1 do the following 
If a is a divisor of n output composite and done 
If (x — a)” is not congruent to x” — a mod (x” — 1, n) output composite and 
done 
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Step (4): Output prime. 


Although the algorithm is deterministic, it is not clear that it can be accomplished 
in polynomial time. What is necessary is to show that polynomial bounds can be 
placed on determining gq, r, s. This can be done. The following is a program written 
in pseudocode which can be implemented even on a relatively small computer which 
places the appropriate bounds. It is also necessary to have an algorithm to implement 
the first step. This can be done in linear time. 


AKS Algorithm Program: Input an integer n > 1. 
1: Ifn = a? for some natural numbers a, b with b > 1 then output COMPOS- 
ITE. 

2r=2 

3: while (r < n) do { 

4: if ((n, r) € 1) output COMPOSITE 

5 if (r is prime ) 

6: let g be the largest prime factor of r — 1 

vk: if (¢ = 4,/r log, n) and (n'a # | modr) 

8: break; 

9: r<rt+l 

10: } 

11: for a = 1 to 2,/rlog,n 

12: If (vw — a)" is not congruent to x” — a mod (x — 1, 7) output 
COMPOSITE; 

13: output PRIME; 


The crucial thing is that by determining these bounds it makes the algorithm run 
in polynomial time. 


Theorem 5.3.6 (AKS) The AKS algorithm runs in O(log) n)'?f (log, log, n)) time. 
That is, the time to run this algorithm is bounded by a constant times the number of 
digits to the 12th power times a polynomial in the log of the number of digits. 


The proof of the AKS algorithm has been refined by several people (see [Be]) and 
it has been conjectured that it actually has polynomial running time O((log, n)°). 

In theory the AKS algorithm should be the fastest running primality tester. How- 
ever, computational complexity is only a theoretical statement as n —> oo. In practice, 
at the present time, several of the existing algorithms actually run faster. However, 
the implementation of the AKS algorithm will probably improve. As mentioned, in 
Section 5.5 we will give the proof of this theorem. In the next section we introduce 
the ideas behind the probabilistic primality tests. 


5.3.1 Pseudo-Primes and Probabilistic Testing 


In this section we present two probabilistic primality tests; the Solovay—Strassen 
test and the Miller—Rabin test. The basic idea in both of these is to test, for an 
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inputted integer n, a sequence of bases in the Fermat test. The hope is that a base 
will be located for which the test fails. In this case the number is not prime. If no 
such base is found a probability can be assigned, determined by the number of bases 
tested, that the number is prime. First we introduce some necessary concepts. 


Definition 5.3.1 Let n be a composite integer. If b > 1 with (n, b) = 1 thenn isa 
pseudoprime to the base b if b"-! = 1 mod n. 


Hence n is a pseudoprime to the base b if it passes the Fermat test and hence is a 
probable prime. 
EXAMPLE 5.3.1.1 25 is a pseudoprime to the base 7. To see this notice that 


7 = 49 = —1 mod 25. 


This implies that 7+ = 1 mod 25 and hence 74 = 1° = 1 mod 25. 
Notice that 25 is not a pseudoprime mod 2 or 3. 


Theorem 5.3.7 For each base b > | there exists infinitely many pseudoprimes to 
the base b. 


Proof Suppose b > 1. We show that if p is any odd prime not dividing b? — 1 then 


the integer n = we is a pseudoprime to the base b. Note that for this n we have 


bP—1 bP-1 bei 
n= — 


b—1 b—-1 b+i1 


so that 1 is composite. 

Given b from Fermat’s theorem we have b? = b mod p and hence b”? = b* mod 
p.Nown-1= ee and since p does not divide b” — 1 by assumption it follows 
that p divides n — 1. 


Further 


n—1=bP-2 4 pP44...4 b”. 


Therefore n — 1 is a sum of an even number of terms of the same parity so n — 1 
must be even. It follows that 2p divides n — 1. Hence b”? — 1 is a divisor of b”~! — 1. 
However 

b? —1=Omodn => b""!—1=0modn. 


Therefore n is a pseudoprime to the base b proving the theorem. 


Although there are infinitely many pseudoprimes they are not that common. It 
has been shown, for example, that there are only 21,853 pseudoprimes to the base 
2 among the first 25,000,000,000 integers. Hence there is a good chance that if a 
number, especially a large number, passes a test as a pseudoprime, then it is really 
a prime. The question becomes how to make this chance or probability precise. List 
of many pseudoprimes can be found on various Internet websites (see [PP]). 

From simple congruences the following is clear. 
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Lemma 5.3.1 [fn is a pseudoprime to the base b, and also a pseudoprime to the 
base bz then it is a pseudoprime to the base bybo. 


Probabilistic methods proceed by testing n to a base b). If it is not a pseudoprime 
then it is composite and we are done. If it is a pseudoprime, test a second base bz and 
so on, in the hope of finding a base where it is not a pseudoprime. However there do 
exist numbers which are pseudoprimes to every possible base. 


Definition 5.3.2 A composite integer n is a Carmichael number if n is a pseudo- 
prime to each base b > 1 with (n, b) = 1. 


If n > 3 is a Carmichael number then n must be odd. To see this suppose that n 
is even. We have (n — 1, n) = 1 and since n is a Carmichael number (n — 1)"~! = 1 
mod n. However (n — 1)"~! = —1 mod n since n is even. Hence n|2 which is a 
contradiction since n > 3. It follows that n must be odd. 

The Carmichael numbers can be completely classified. Interestingly this was done 
even before the existence of Carmichael numbers was shown. The following is called 
the Korselt criterion after A. Korselt. 


Theorem 5.3.8 An odd composite number n is a Carmichael number if and only if 
n is squarefree and (p — 1)|(n — 1) for every prime p dividing n. 


Proof Suppose that n is odd and composite. We first show that if a number n is not 
squarefree then it cannot be a Carmichael number. 

Suppose that n is not squarefree. Then there exists a prime p with p?|n. From 
Theorem 2.4.14 the multiplicative group in Z,. is cyclic (that is there exists a primitive 
element) and hence there is a multiplicative generator g mod p”. Since $(p?) = 
p(p— 1) we have g?—") = 1 mod p? and this is the least power of g that is congruent 
to 1 mod p?. Now let m = p; --- py where pj, ..., py are the other primes besides p 
dividing n. Notice that p* is not a Carmichael number so these primes exist. Choose 
a solution b to the pair of congruences 


b = q mod p” 

b=1modm 
which exists from the Chinese remainder theorem. Since b = g mod p” it follows 
that b also has multiplicative order p(p — 1) mod p*. Suppose n was a Carmichael 
number. Then n would be a pseudoprime to the base b and hence 

b’"' = 1 modn. 

This implies that p(p — 1)|n from the multiplicative order of b. However since p|n 
we have n — 1 = —1 mod p. On the other hand if p(p — 1)|(n— 1) we haven—1 =0 


mod p acontradiction. Therefore n cannot be a pseudoprime to the base b and hence 
is not a Carmichael number. 
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Now suppose that n is squarefree so that n = pip2---p, with k > 2 and the p; 
distinct primes. Suppose first that (p; — 1)|(n — 1) fori = 1, ..., k and suppose that 
(b,n) = 1. Then 


pr Spe ME =4" = 1 mod pt = 1,2 Vk 


Hence 
Bb’! = 1 mod p; -+- pg. 


That is, 
b"-! = 1 modn. 


Therefore n is a pseudoprime to the base b and since b was arbitrary with (b, n) = 1 
it follows that n is a Carmichael number. 

Conversely suppose that n = p, --- px is a Carmichael number. Let p; be one of 
these primes and suppose that g is a generator of the multiplicative group of Z,,. 
Recall as in the proof of the squarefree property that this group is cyclic. Hence g has 
multiplicative order p; — 1 mod p;. Now let b be a solution to the pair of congruences 


b=gmodp; 
n 
b=1mod —. 
Pi 


Then b also has multiplicative order p; — 1 mod p;. Further since (b, p;) = | and 
(b, a = | it follows that (b,n) = 1. Since n is a Carmichael number it is a 
pseudoprime to the base b and hence 


b” 1 =1modn => b"! =1 mod pj. 


It follows that (p; — 1)|(” — 1) proving the theorem. 


Corollary 5.3.1 A Carmichael number must be divisible by at least 3 primes. 


Proof Suppose that n is a Carmichael number. Then from the proof of the previous 
theorem n = p,---p, with k > 2 and the p, distinct primes. We must show that 
k > 2. Suppose that n = pq with p < q, p and q primes. Since n is a Carmichael 
number from the previous theorem (q — 1)|(n — 1). However 


n—-1l=pq-—1=p(q-—14+1)-1=(@-1 mod(g-1). 


Since (q — 1)|(n— 1) this would imply that (g — 1)|(p — 1) which is impossible since 
p < q. Therefore if n = pq it cannot be a Carmichael number and hence k > 2 so 
that n must be divisible by at least 3 distinct primes. 


Using the Korselt criterion we can present an example of a Carmichael number. 
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EXAMPLE 5.3.1.2 The integer n = 561 = 3- 11 - 17 is a Carmichael number. 
Here n — 1 = 560 which is divisible by 2, 10 and 16 and hence by the Korselt 
criterion it is a Carmichael number. This is well known as the smallest Carmichael 
number (see the exercises). 

Carmichael numbers are relatively infrequent. It has been shown for example that 
there are only 2163 Carmichael numbers among the first 25 billion integers. However 
it has been proved by Alford, Granville and Pomerance that there do exist infinitely 
many Carmichael numbers. There is a list of Carmichael numbers up to 10!° (see 
[CP]). 


Theorem 5.3.9 (Alford, Granville, Pomerance) There are infinitely many 
Carmichael numbers. In particular if C(x) denotes the number of Carmichael num- 
bers less than or equal to x then C(x) > x7 for x sufficiently large. 


We note that there are conjectured theorems on the distribution of C(x) analogous 
to the Prime Number Theorem (see [CP]). 

To proceed further we define several stronger types of pseudoprimes. Recall that 
if n = p is a prime then Z, is a field. Hence the polynomial equation 


x° = 1 modp 


has only the solutions x = | mod p and x = —1 mod p. Therefore if (a, p) = 1 we 
must have 7 
az? =+1modp. (5.3.1) 


Recall that for a prime p the Legendre symbol (a/p) = +1 whether or not a is 
a quadratic residue mod p (see Section 2.6). We need an extension of the Legendre 
symbol. 


Definition 5.3.3 [fn is a positive odd integer with prime factorization n = p\' --- p; 
and a is a positive integer then the Jacobi symbol is 


(a/n) = (a/p,)" +++ (a/pe)*. 


Several of the results concerning the Legendre symbol, including quadratic reci- 
procity, can be extended to the Jacobi symbol. 


Theorem 5.3.10 [fm, n are odd positive integers then: 
n- 
(1) 2/n) = (-1)"F 
(2) (Jacobi Quadratic Reciprocity) 


(m—1)(n-1) 


(m/n) =(-l) 


(n/m). 


The proofs of both of these assertions follow easily from the corresponding results 
on the Legendre symbol and we leave them to the exercises. 
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Note that if p is a prime then the Jacobi symbol and the Legendre symbol are 
identical. Hence for any prime p and integer a with (a, p) = | 


at = (a/p) mod p 


where on the right hand side we consider (a/p) as the Jacobi symbol. 


Definition 5.3.4 An odd composite integer n is an Euler pseudoprime to the base 
b if 


n= 


bt = (b/n) modn 


where (b/n) is the Jacobi symbol. 


Since (b/n) = +1 it follows easily that an Euler pseudoprime to the base b must 
also be a pseudoprime to the base b (see the exercises). However, the converse is 
not true—there exists pseudoprimes to a base b which are not Euler pseudoprimes 
to that base. 

EXAMPLE 5.3.1.3 91 is a pseudoprime to the base 3 since 3°? = 1 mod 91. 
However, 3% = 27 mod 91 so 91 is not an Euler pseudoprime to the base 3. 

What is crucial in describing our first probabilistic primality test is that there are 
no “Carmichael type” numbers for Euler pseudoprimes. In fact, if n is composite it 
will fail to be an Euler pseudoprime for at least 5 of the bases b with (b, n) = 1. 


Theorem 5.3.11 (Solovay, Strassen) If n is an odd composite integer then n is an 
Euler pseudoprime for at most 5 of the bases b with | < b < nand (b,n) = 1. 


Proof Suppose that n is odd and composite. We first show that in this case if n is not 
an Euler pseudoprime for at least one base b then it is not an Euler pseudoprime for 
at least half of the bases b with 1 < b <n, (b,n) = 1. We then show that if n is odd 
and composite there is a base b for which n is not an Euler pseudoprime. 

Suppose that n is odd and composite and suppose that n is not an Euler pseudo- 
prime to the base b. That is 


n-1 


b? #£+1modn. 


If n is not an Euler pseudoprime to any base then certainly it is not an Euler 
pseudoprime for at least half of the possible bases. Suppose then that n is an Euler 
pseudoprime to the base b so that 


bi? = +1 modn. 


Then 
(bb})? =b 7b =b? #41 modn. 


Hence v is not an Euler pseudoprime to the base bb,. Therefore for every base b; for 
which n is an Euler pseudoprime, n is not an Euler pseudoprime for the base bb;. 
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Further if b;, b; are distinct (mod n) bases for which n is an Euler pseudoprime bb; 
is not congruent to bb; mod n. It follows that if {b,, ..., b,} are the distinct bases for 
which n is an Euler pseudoprime then {bb;, ..., bb;,} are distinct bases for which n 
is not an Euler pseudoprime. Therefore there are at least as many bases for which n 
is not an Euler pseudoprime as there are bases for which it is. We conclude then that 
if there exists at least one base b for which n is an Euler pseudoprime then n is an 
Euler pseudoprime for at most 5 of the possible bases. 

We now show that there must exist a base b for which n is not an Euler pseudo- 
prime. Suppose first that is not square-free so that there exists a prime p with p?|n. 
Let g be a generator of the multiplicative group of Z,2. Then as in the proof of the 
Korselt criterion, g has exact multiplicative order ¢(p?) = p(p — 1). Let b solve the 
pair of congruences 

b = q mod p” 


n 
b=1mod —. 
P 


Then suppose that b'> = 1modzn.Itfollows that p(p—1)|(—1) whichis impossible 
since p2|n. Next suppose that b*t = —1 mod n. Then b"-! = 1 modnso b""! = 1 
mod p’. It follows that p(p — 1)|(n — 1). But then again p|(n — 1) a contradiction. 
Hence if n is not squarefree, b as chosen above, is a base for which n is not an Euler 
pseudoprime. 

Now suppose that n is square-free with n = p, --- p; with p; distinct primes. Let 
g be a nonsquare mod p;. Recall that there are only oat squares mod p; so such 
nonsquares exist. Hence (g/p,) = —1. Choose a base b satisfying the simultaneous 
congruences 

b=gmodp; 


b=1modp;,i=2,...,k 
which exists by the Chinese remainder theorem. We then have for the Jacobi symbol 
(b/n) = (b/p1)(b/p2) - -- (6/Px)- 
But (b/p,) = —1 since b = g mod p, and (b/p;) = (1/p;) = 1. Hence 
(b/n) = -1. 
If n were an Euler pseudoprime to the base b then 
b= = (b/n) modn 


so that 2 


bz =—I1modn. 
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But then 


n-1 


b2? =-—1modp2 


which is a contradiction since b = 1 mod p2. Therefore n cannot be an Euler pseudo- 
prime to the base b. Hence in each case there does exist a base for which n is not an 
Euler pseudoprime, proving the theorem. 


Theorem 5.3.11 is the basis for the Solovay—Strassen Primality Test. Suppose 
that we are given an odd integer n. Choose k integers bj, bo, ..., by at random with 
1 < bj < n. If for some i we have (b;,n) > 1 then n is composite. If all b; are 
relatively prime to n then for each b; compute 

(1) BP? mod n and 
(2) (b;/n) mod n. 
If (1) does not equal (2) for some b; then n is composite. Finally if 


bY = (bi/n) mod n 


for alli = 1,..., & then the probability that 1 is not prime is less than Gk. 

To see this notice that if n passes the conditions for b, then the probability of 
being composite from the Solovay—Strassen result is less than 5. But bo is chosen 
randomly so the events that n passes the conditions for b, and b2 are independent. 
Hence the probability that n passes the conditions for both b; and bp is 5 : 5 = t 


and so on. 


Solovay-Strassen Primality Test: Input an odd integer n 
1: Choose k random integers bj, ..., by with |< bj <n 
2: Fori=1,...,k 
a: Compute (b;, n) (by the Euclidean algorithm) 
i: If (b;, 2) > 1 then n is composite and stop. 
b: Compute (1) b\"~”? mod n and (2) (b;/n) mod n 
i: If (1) 4 (2) then n is composite and stop 
3: The probability that n is prime is greater then 1 — x 
Miller and Rabin determined an even stronger test than the above by extending 
the idea of an Euler pseudoprime. 


Definition 5.3.5 Let n be an n composite integer with n — 1 = 2°t with t odd. If 
b > Land (n, b) = 1 then n is a strong pseudoprime to the base b if either 

(1) b' = 1modnor 

(2) there exists r with 0 < r < s such that b”' = —1 mod n. 


The Miller—Rabin test is based on the following theorem analogous to the 
Solovay-—Strassen result. It was proved independently by Monier and Rabin. 


Theorem 5.3.12 For each composite integer n > 9 the number of bases b with 
0 < b <n for which n is a strong pseudoprime is less than i. 
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If n is not a strong pseudoprime to the base b we say that b is a witness for n (a 
witness that is composite). Hence if n is composite, Theorem 5.3.11, says that at 
least 3 of all the integers in [1, 7 — 1] are witnesses for n. The Miller—-Rabin test now 
proceeds exactly as the Solovay—Strassen test, except that the probability now that n 
is prime is greater than | — x 

Miller—Rabin Primality test: Input an odd integer n and suppose that n—1 = 2° 
with t odd 

1: Choose k random integers b,,..., by with 1 <b; <n 
2: Fori=1,...,k 
a: Compute (b;, n) (by the Euclidean algorithm) 
i: If (b;,n) > 1 then n is composite and stop 
b: Fori=1,...,k 
i.Compute m; = bi mod n 


j: If m; = +1 then n is a strong pseudoprime to the base b; and 
go on to the next i. Else 
k: For j = 1,...,s — 1 compute k; = b?’ mod n 
1: If k; = —1 mod n then n is a strong pseudoprime to the 


base b; and go on to the next i. If not then go to the next /. 
m. Ifk; # —1 mod n for allj then n is composite and stop 
3: The probability that n is prime is greater then 1 — ze 
The Miller—Rabin test can be made deterministic under the assumption that the 
Extended Riemann Hypothesis holds (see Chapter 4). In particular Bach proved the 
following. 


Theorem 5.3.13 Assuming that the Extended Riemann Hypothesis holds then for 
any odd composite integer n there is a witness less than 2(Inn)”. 


Hence based on the theorem we would only have to test for witnesses less than 
2(Inn)’. If there are none, then n is prime. This is then a deterministic polynomial 
time algorithm. However it depends on the unproved Extended Riemann Hypothesis. 


5.3.2. The Lucas—Lehmer Test and Prime Records 


A large portion of primality testing has centered on the Mersenne primes. In fact most 
of the prime records, that is, the determination of a largest known prime involves 
finding larger and larger Mersenne primes. 

Recall from Section 3.1.3 that a Mersenne number is a positive integer of the form 
M, = 2"-1,n=1,2,....IfM, is prime then M, is a Mersenne prime. Recall that 
it is not known whether or not there are infinitely many Mersenne primes. However 
it is conjectured, and believed, that there are infinitely many Mersenne primes. 

Testing Mersenne numbers for primality has been particularly fruitful because of 
the Lucas—Lehmer test. This is a straightforward deterministic primality test specific 
to the Mersenne numbers. It is relatively easy to implement on a computer and has 
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been quite successful in finding larger and larger Mersenne primes. For the most 
part historically, the largest known Mersenne prime, is also the largest known prime 
or current prime record. From Theorem 3.1.6 if M,, = 2” — | is prime then n must 
be prime. Finding Mersenne primes then is often an experimental procedure with 
random prime exponents being tested by using the Lucas—Lehmer Test. In Table 5.1 
we list the known Mersenne primes as of the writing of this book. Note that because 
the choice of prime exponents to test is random there may be other Mersenne primes 
between those on the list. 

When looking at this table it should be mentioned how enormous the recent 
Mersenne primes are. In particular the most recent (in 2016) has close to 22.3 million 
decimal digits. We should also point out that although there may be intermediate 
Mersenne primes between those on the list, as of 2015, all Mersenne primes less 
than or equal to number 48 have been checked. It has been conjectured that there is a 
prime number type theorem for Mersenne primes. It particular it has been conjectured 
that if M(x) is the number of primes p < x with M, prime then M(x) ~ c Inx. Further 
c= S where 7¥ is Euler’s constant (see [CP]). 

Before giving the Lucas—Lehmer Test we review some facts about the Mersenne 
numbers. Recall that the Mersenne numbers are closely tied to the perfect numbers. 
A natural number n is a perfect number if it is equal to the sum of its proper divisors. 


That is, 
n= = d 
d\n,d>1,dAn 


For example the number 6 is perfect since its proper divisors are 1, 2, 3 which add up 
to 6. We then have the following concerning Mersenne numbers, Mersenne primes 
and the ties to perfect numbers. 


Theorem 5.3.14 (J) If M, = 2" — 1 is prime then n is prime (Theorem 3.1.6). 

(2) IfM, = 2? —1 is a Mersenne prime thenn = 2P—! (2P — 1) is a perfect number. 
(Due to Euclid and given in Theorem 3.1.7). 

(3) Conversely if n > 2 is a perfect number and even then n = 2?—'(2? — 1) and 
M, = 2? — 1 is a Mersenne prime. (Due to Euler and given in Theorem 3.1.7). 


Notice that from the theorem in searching for Mersenne primes only prime expo- 
nents must be considered. We now state the Lucas—Lehmer Test (note this was pre- 
sented also in Section 3.1.3). 


Theorem 5.3.15 (Lucas—Lehmer Test). Let p be an odd prime and define the 
sequence (S,) inductively by 


S, =4 and S, = S?_, —2. 


Then the Mersenne number M, = 2? — | is a Mersenne prime if and only if M, 
divides Sp_1. 
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Table 5.1 The known Mersenne primes M, with p prime 


Number p Discoverer and year 

1 2 Unknown - pre 1500 

2 3 Unknown - pre 1500 

3 5 Unknown - pre 1500 

4 7 Unknown - pre 1500 

5 13 Anonymous - 1456 

6 17 Cataldi - 1588 

7 19 Cataldi - 1588 

8 31 Euler - 1772 

9 61 Pervushin - 1883 

10 89 Powers - 1911 

11 107 Powers - 1914 

12 127 Lucas - 1876 

13 521 Robinson - 1952 

14 607 Robinson - 1952 

15 1279 Robinson - 1952 

16 2203 Robinson - 1952 

17 2281 Robinson - 1952 

18 3217 Riesel - 1957 

19 4253 Hurwitz and Selfridge -1961 
20 4423 Hurwitz and Selfridge -1961 
21 9689 Gillies -1963 

22 9941 Gillies -1963 

23 11213 Gillies -1963 

24 19937 Tuckerman -1971 

25 21701 Noll and Nickel - 1978 

26 23209 Noll -1979 

27 44497 Slowinski and Nelson - 1979 
28 86243 Slowinski - 1982 

29 110503 Colquitt and Welsh - 1988 

30 132049 Slowinski - 1983 

31 216091 Slowinski - 1985 

32 756839 Slowinski and Gage - 1992 
33 859433 Slowinski and Gage - 1994 
34 1257787 Slowinski and Gage - 1996 
35 1398269 Armengaud, Woltman et. al - 1996 
36 2976221 Spence, Woltman et.al - 1996 
37 3021377 Clarkson, Woltman, Kurowski et.al - 1998 
38 6972593 Hajratwala, Woltman and Kurowski - 2000 


(continued) 
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Table 5.1 (continued) 


Number p Discoverer and year 

39 13466917 Cameron, Waltman,Kurowski - 2001 
40 20996011 Shafer, Woltman and Kurowksi - 2003 
41 24036583 Findley, Woltman and Kurowksi - 2004 
42 25964951 Nowak, Woltman and Kurowksi - 2005 
43 30402457 Cooper, Boone - 2005 

44 32582657 Cooper, Boone - 2006 

45 37156667 Elvenich, Woltman and Kurowksi - 2008 
46 43112609 Smith, Woltman and Kurowksi - 2008 
47 42643801 Odd Magnar Strimo,Melhus - 2009 

48 57885161 Cooper - 2013 

49 74207281 Cooper - 2016 


Proof We first show that if M, divides S,_; then M, is prime. We follow the proof 
given in [Br] and redone in [Tu] and [PP]. 

Let u = 2—/3,v =24+/3.Thnutv=4= S; and uv = 1. An easy 
induction (see the exercises) shows that 


Qn- gr-l 


S, =u Yahi 


Suppose that M,|S,—-1. We show that M, must be a prime. Suppose not and let g 
be a prime dividing M, with g < \/M,. Since M,|S,—; we also have q|S,_1. 

Consider the finite field Z,. If 3 is a square mod gq, that is, (3) = 1, let F = Z,. 
If 3 is not a square mod g let F be the extension field of Z, adjoining a J/3. That 
is F = Z,(w) where w? = 3 (see Chapter 6). In either case F is a finite field, of 
order g in the former case and order q? in the latter. Recall that the multiplicative 
group of a finite field is cyclic (see Chapter 2). Hence if g € F with g 4 0 then g 
has multiplicative order d with either d|(qg — 1) or d| (¢° — 1). Since (g — 1)| (¢° —1) 
we can assume without loss of generality that d|(q? — 1). 

From uv = | and the induction we have 


Sp-1 = ue +P? = Taal + prey, 
Since q|S,_; we then obtain 
w(d + yer’) = 0 mod gq. 
Now u = 2 — V3 is not congruent to 0 mod gq for if it were 


2 = V3 mod gq = 4=3modq 
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which is possible only if g = 1. Hence mod g 


PO ett a0 SS OW BHI. 


1l+v? 
Therefore v” 4 1 in F,,. It follows that the multiplicative order of v mod g must 
divide 2? and therefore the multiplicative order of v as an element of F must also 
divide 2?. This then must be a power of 2 say 2”. If m < p — 1 then 2”|2?~! from 
which it follows that v””' = 1 in F , and not —1. Therefore m must equal p and the 
order of v in F must be exactly 2”. 
However as explained earlier the order of any nonzero element in F must divide 
gq’ — 1 and so 2?|(q* — 1) which implies that 2? < q* — 1. On the other hand we 
have 2? = M, + 1 and q < ,/M, and so we have the inequality 


M,+1=2? <q’-1<M,-1 


which is a contradiction. Therefore no such q can exist and therefore M, must be 
prime proving the Lucas—Lehmer Theorem in one direction. 

Conversely we show that if M, is prime then M,|S,_1. 

Let g = M, and let u = 2 — V3, v = 2+ V3 as in the first part of the proof. We 
will show that 


v' =—1 modgq 


and hence 
p—2 —2 p—2 = 
Spa =u pe a (1 + 0?” ”) = OO mod g. 


This then shows that M,|S,-1. 

To show that v has this order notice first that gq — 1 = 2? -2 = 2(2P-! — 1). It 
follows that at is odd so that (—1) = —1 s0 that —1 is nota square mod q. 

Next, notice that since g is prime 27 = 2 mod g from Fermat’s theorem. Hence 
24+! = 4 mod q which implies that 2”” = 4 mod g. Since p is a prime > 3 it follows 
that mod g, 2 has both a square root (2!/2 = 2/4) and a fourth root (2!/4 = 2r) 
mod q. 

Finally as a preliminary we show that 3 is not a square mod gq. One of the three 
consecutive integers gq — 1, g,q + 1 must be divisible by 3. g+ 1 = 2? is a power of 
2 and q is a prime > 3. Hence 3|(q — 1). Let g be a generator of the multiplicative 
group of Z,. It follows that w = gt satisfies w> = 1 mod g and w # | mod g. 
Since 

w—-l=(w—l)(w*t+wt+t) 


it follows that 
w? +w+1=0mod g. 


Let z = w — w?. Then mod qd 
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2)? = y* —2u3 4+ wt = w’* - 24 = -3. 


2 =(w—w 


Therefore —3 is a square mod q. Since —1 is not a square mod q it follows that 3 is 
also not a square mod gq. 

Since 3 is not a square mod gq let F be the extension field of Z, adjoining a V3. 
That is F = Z,(r) where r? = 3. F is then a finite field of order q. 


Letv =24+r=24+V3inF. Since 3 is not a square mod g we have 37 =-1 
mod q. Hence in F, 


vf = (24+ nr) = 244+ 74 =24 (V3)1 = 2433; 
=> yf a243% 3232-3) =2-V3=u. 
Since 2 is a square mod q, 27! is also a square mod q. Here 27! is the multiplicative 


inverse of 2 mod qg which exists since g is an odd prime. Let 2-2 bea squareroot of 
27! mod q. Let t € F be given by 


t=(1+r)272. 
Then in F' we have 
26S 2(9-3)2 — 2)5-1 _ -1_ = 
t=U4+ry°Q 7) =d4+2r4r)20 =14+2r4+3)20 =2+r=v. 
Therefore ¢ is a squareroot of v in F. We show that v does not have a fourth root in 
F. 

Suppose v had a fourth root. Then ¢ would have to be a square and since 2-2 isa 
square this would imply that 1 + r would have to be a square also. Hence we show 
that 1 +7 is not a square in F. This is done by computation in F. The elements of F 
are of the form a + bw with a, b € Z,. Suppose that (a + bw)? = 1+ r. Then 

a + 2abw + bw? = (a + 3b*) + Qab)w =1+4r. 
This would imply that 
a + 3b? = land 2ab=1 => a? + 3b? = 2ab mod q 
2 2 2 a 
=> a — 2ab+3b* = (a—b) +2b° = 0 mod gq 


(a — b)* a—b 
p =.¢ 5 —— 2 mod q. 


Hence —2 must be a square mod g. However 2 is a square mod g and —1 is not a 
square mod gq and therefore —2 cannot be a square. Therefore 1 + r is not a square 
in F and hence v has no fourth root in F’. 
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Now v? = uso v?t! = uv = 1 mod q. Since v has no fourth root it follows that in 
F the order of t is precisely 2(q + 1). Since this must divide q’ — 1 = (¢+ I(q—1) 
it follows that the order of v must be exactly q + 1. But then 


completing the proof. 


Based on the theorem the algorithm for testing a Mersenne prime is particularly 
simple. 


Lucas—Lehmer Algorithm 
Input a prime p. 
a: Letu = 4. 
b: For i = 3 top 
(1): Let u = u* — 2 mod (2? — 1). 
(a): If « = 0 output prime and finish 
(b): else next i. 
c: output composite. 


5.3.3 Some Additional Primality Tests 


The Lucas—Lehmer test is called an n+1-test since it requires knowledge of a com- 
plete factorization of n + 1. (Recall M, = 2” — 1 so M, + 1 = 2”.) Other tests 
have been developed to handle the situation where there is knowledge of a complete 
factorization of n — 1. These are known as (n — 1) tests and handle, in particular, 
testing for Fermat primes. Recall (see Chapter 3) that the Fermat numbers are the 
sequence (F’,) of positive integers defined by 


F, =2” +1,n=1,2,3,..... 


If F,, is prime it is called a Fermat prime. As discussed in Chapter 3, Fermat believed 
that all the numbers in this sequence were primes. In fact F;, Fo, F3, F4 are all prime 
but F5 is composite. It is still an open question whether or not there are infinitely 
many Fermat primes, however, it has been conjectured that there are only finitely 
many. On the other hand if a number of the form 2” + | is a prime for some integer n 
then it must be a Fermat prime (see Theorem 3.1.5). Lucas’ primality test (Theorem 
5.3.3) can be considered an (n — 1)-test. 
Lucas’ result was strengthened by Pocklington in the following form: 


Theorem 5.3.16 (Pocklington’s Theorem) Suppose (n—1) = fr with (f, r) = 1 and 
suppose that a complete factorization of f is known. Suppose that there exists an a 
such that 


n— 


a 1 = 1 modnand (a7 ,n) =1 
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for every prime factor q of f. Then every prime factor of n is congruent to 1 mod f. 


Proof Let p be a prime factor of n. Since a’~' = 1 mod n the multiplicative order 
d of a’ in the finite field Z, is a divisor of nt = f. However from (az, n) = lit 
follows that d cannot be a proper divisor of f and hence d = f. Therefore f|(p — 1) 
since the multiplicative group in Z, has order p — 1. 


Pocklington’s theorem can then be fashioned into a primality test. 


Corollary 5.3.2. Suppose n — 1 = fr with (f, r) = 1 and suppose that a complete 
factorization of f is known. Suppose that there exists an a such that 


n— 


a 1 = 1 modnand (a7 ,n) =1 


for every prime factor q of f. Then if f > ./n it follows that n is prime. 


Proof From Theorem 5.3.16 it follows that each prime factor p of n is congruent to 
1 mod f. Hence p > f. But f > ./n so each p > ./n. Therefore n cannot have a 
prime factor < \/n and son = p and nis prime. 


Pocklington’s theorem which was done in 1914 actually extended several earlier 
results that were specific to the testing of Fermat numbers for primality. Pepin’ 
theorem (Theorem 5.3.17) was done in 1877 and Proth’s theorem in 1878. 


Theorem 5.3.17 (Pepin’s Theorem) Let F, = 27" + 1 be the nth Fermat number. 
Fn-1 


Then F,, is prime if and only if 3° = —1 mod Fy. 


Proof \f 3° = —1 mod F, then the argument used in proving Pocklington’s 


theorem with a = 3 can be used to show that F,, is prime. Conversely suppose 


F, is prime. Then 3°> = (+) mod F,, where (2) is the Jacobi symbol. It is 


straightforward to check (see the exercises) that (7) =-l. 


Theorem 5.3.18 (Proth’s Theorem) Letn = f -2* + 1 with 2" > f. If there exists 
an integer a with a’? =—1modn thennis prime. 


Proof The same arguments as in the proof of Pocklington’s theorem can be 
applied. 


These results, together with the Lucas—Lehmer test, just begin to scratch the sur- 
face of primality testing. A complete discussion of primality testing together with 
discussions of computational complexity of both primality testing and factorization 
algorithms can be found in the excellent and comprehensive book by Crandall and 
Pomerance [CP]. There are also many suggestions given in [CP] for research prob- 
lems. 

Recent work, leading eventually to the polynomial time algorithm (AKS), has 
concentrated on improving both the running time and computational complexity of 
primality testing algorithms. The major breakthrough from a computational point 
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of view came with the development in 1983 by Adelman, Pomerance, and Rumely 
of a deterministic algorithm (the APR algorithm) based on Jacobi sums (see [CP]) 
which ran in subexponential time. The fact that this could be done was in essence the 
first step toward the eventual polynomial time algorithm. The approach of the APR 
algorithm extended a line of research that considered testing for primality via Gauss 
sums (see [CP]). 


5.3.4 Elliptic Curve Methods 


There have been many additional approaches to primality testing. A very fruitful 
approach which has had wide ranging applications both in number theory and cryp- 
tography is to use elliptic curves. In this section we define and explains elliptic 
curves and their utilization in primality testing. Then in Section 5.6 we will discuss 
elliptic curve cryptography. 

If F is a field of characteristic not equal to 2 or 3 then an elliptic curve over F is 
the locus of points (x, y) € F x F satisfying the equation 


y =x +ax+b with 4a + 27h £0. 
We denote by 0 a single point at infinity and let 
E(F) = {(x,y) € F x F;y* =x? +.ax + db} U {0}. 


We also call E(F) an elliptic curve over F. 

The important thing about elliptic curves from the viewpoint of number theory 
and primality testing is that a group structure can be placed on E(F). In particular 
we define the operation + on E(F) by the following rules that for future reference 
we will denote by (ECR): 


1. 0+ P = P for any point P € E(F) 

2. If P = (, y) then —P = (x, —y) and -—0 =0 

3. P+(—P) = 0 for any point P € E(F) 

4. If P; = (1, y1), P2 = (%2, y2) with P; 4 —P» then 


P, + P2 = (43, y3) with 


x3 =m — (x +2), 3 = —M03 — 11) — 1 
and 
m = 27" if xy # x and 
x2 — X] 


3x? +a 
m= 


if x. = x}. 
YI 
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This operation has a very nice geometric interpretation if F = R the real numbers. 
It is known as the chord and tangent method. If P; 4 P2 are two points on the curve 
then the line through P,, P2 intersects the curve at another point P3. If we reflect P; 
through the x-axis we get P; + P2. If P; = P> we take the tangent line at P;. 

With this operation E(F) becomes an abelian group (due to Cassels) whose struc- 
ture can be worked out (see [CP]). 


Theorem 5.3.19 E(F) together with the operations defined above forms an abelian 
group. If F is a finite field of order p* then E(F) is either cyclic or has the structure 


E(P) = Zin x Zing 


with m,|m2 and m|(p* — 1). 


By considering the order of the group E(F) over finite fields, Lenstra developed 
a factorization algorithm (ECM) (see [CP]). His method, as well as elliptic curve 
primality testing, depends on the concept of an elliptic pseudocurve. An important 
fact in forming the elliptic curve group is that F is a field. An elliptic pseudocurve 
is the set of points satisfying an elliptic curve equation over a modular ring Z, not 
necessarily a field. If n is not a prime then we cannot expect the total validity of 
the group laws. Even the combination of two points is not necessarily defined in all 
cases. This is the idea behind Lenstra’s Factorization Algorithm (ECM). 

In particular if 1 is a positive integer with (n,6) = 1, a,b € Z, and (4a3 + 
27b?, n) = 1 if a, b are considered as integers, then an elliptic pseudocurve over Z,, 
is a set 

E(Zn) = {(,y) € Zn X Zni y° = x + ax + b}U {0} 


with 0 a point at infinity. As usual we identify Z, with {0,1,...,— 1}. The name 
pseudocurve indicates that Z,, need not be a field. 

Using the concept of a pseudocurve, Goldwater and Killian developed an elliptic 
curve analog of Pocklington’s theorem (Theorem 5.3.16) which ushered in elliptic 
curve primality proving (ECPP). We state the theorem and then discuss pseudocurves 
in more detail. 


Theorem 5.3.20 (ECPP) Letn > 1 with (n, 6) = 1, E(Z,) an elliptic pseudocurve 
over Z, and s,m positive integers with s|m. Let [m] denote the residue class of m 
and assume that there exists a point P € E such that [m|P = 0 and [ZIP # 0 for 
every prime divisor q of s. Then for every prime p dividing n we have 


|E(Zp)| = O mod s. 


Further if s > (ni + 1)? then n is prime. 


The Goldwater-Killian theorem was improved upon by Atkin and Morain who 
developed a very efficient elliptic curve primality testing algorithm. In practice this 
algorithm seems to be at present the fastest computationally. However, it is felt 
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that ultimately an implementation of the theoretically faster AKS algorithm will be 
developed that will be computationally faster. 

To handle pseudocurves we first transfer the addition operation from elliptic curves 
to pseudocurves and assume the restriction that the results are not always defined. 
Let us consider the rules (ECR) for the definition of P; + P2. If P; # P2 then P| + P2 
is defined only if x2 — x, is invertible in Z,, that is, if (2 —x,,n) = 1. If Pj = P2 then 
P, +P, = 2P, is only defined if 2y, is invertible in Z,, that is, if (yj, 2) = 1 because 
2 4 n. We remark that if P = (x, y) then —P = (x, —y) with P+ (-P) = P-—P=0 
always exists. 

The idea of using this for factorization is due to H.W. Lenstra Jr. We now describe 
the details of the method. As above let n € N with n > 2 and (n, 6) = 1. First, we 
randomly determine a, x, y € {0, 1, ...,— 1} and then compute b according to the 
rule 


3 


b=y’ —x° — ax modn. 


If (4a? + 27b?, n) ~ | then we either found a nontrivial divisor of n or we repeat the 
process. If we determined a and b first it would be more difficult to find a (random) 
point of the curve. 

The pseudocurve E(Z,,) is now given by the equation 


y =x +ax +b with 4a* + 27h’, n) = 1 
and the point on E(Z,) is P = (x, y). 


Lemma 5.3.2. Let m > 1 be a divisor of n and let P,, Py € E(Z,) be points such 
that P; + P2 is defined. Then 


(P; mod m) + (P2 mod m) = (P; + P2) mod m. 


Proof We may assume that P; 4 0 and P2 4 0 and let P; = (11, y1), Po = (2, y2). 
The ring homomorphism mod m: Z, — Z,, is compatible with forming inverses, 
that is, for (x, n) = 1 we have 


x! mod m = (x mod m)7! 


where on the left-hand side of the equation, the inverse is meant modulo n, while on 
the right side it is done in Z,,. If for computing the points in E(Z,) and E(Z,,) the 
same computation rules are applied, then it is true that we can draw mod m into the 
terms. In particular, we then have that 


(P; mod m) + (P2 mod m) 
is defined. 


The critical cases to consider are: 
(a) x1 # xX. mod n and x; = x2 mod m, 
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(b) x; = x2 mod n, yj = yo € 0 mod a, and y; = —y2 mod m, 
(c) x1 = x2 modn and y; ¥ +y2 mod n. 
In case (a), P; + P2 is not defined because x2 — x, is divisible by m and therefore 
not invertible modulo n. Thus the conditions of the statement are not met. 
Incase (b), P} + P2 is also not defined because 2y; = yj +y2 modn and m|(y; +y2). 
Thus 2y, is not invertible. 
If case (c) occurs again P; + P> is not defined. 


Let E(Z,), n € N with n > 2 and (6,n) = 1 be a (pseudo)curve. If n is not a 
prime then there are P;, P2 € E(Z,) such that P; + P is not defined. It follows that 
if n is not a prime then there exist 


Py = (1, 1) € E(Zn), P) A 0 and Pz = (x2, y2) € E(Zn), P2 FO 


such that one of the following holds: 

(a) x1 # xX. mod n and x, = x. mod m for a divisor m > 1 of n, 

(b) x} = x. mod n, yj = yo # O modn and y; = —y2 mod m for a divisor 
m > 1 ofn, 

(c) x1 = x2 modn and y; ¥ +y2 mod n. 


Theorem 5.3.21 Suppose that P, + P2 for two points P,, Pz € E(Z,) is not defined. 
Then this yields a nontrivial divisor of n. 


Proof Let P; = (x, y1) and Pz = (x2, y2). If P; + P2 if undefined then there are 
three possibilities. 

The first is that x} 4 x2 mod n but x2 — x, is not invertible modulo n. Then x2 — x, 
is not a multiple of 7 but also not relatively prime to n. Thus (x2 —x,, 1) is a nontrivial 
divisor of n. 

The second possibility is x} = x2 mod n and y; = y2 ¥ O mod nv but 2y, is 
not invertible modulo n. Since n is odd the greatest common divisor (y, 1) yields a 
nontrivial divisor of n. 

The final possibility is x; = x2 mod n but y) 4 +y2 mod n. Then we have 


¥3 — yi = 03 tam +b) — Of + ax +b) =0modn. 


Therefore 
¥3 — Yt = O2 +02 - yn) 


is a multiple of 1 but neither y2 + y; nor yz — y; are multiples of n. This implies that 
both (y1 + y2, 7) and (y2 — yi, ”) are nontrivial divisors of n. 


We give an example of using this theorem to factorize n. 


EXAMPLE 5.3.4.1 Let n = 1715761513 and C acurve given by y* = x*+ax+b 
with a = 42 and b = —91. We first check that C is an elliptic pseudocurve for n. To 
do this we calculate that gcd(6, n) = gcd(4a3 +27b?,n) = 1. 
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The point P = (2, 1) is an element of C so we may test with P. Take 
ear ee tag 
wal, 


Let 
P, =k;P= (x,y) 


Py = kyP = (x2, y2). 


Then we calculate 
P, = (1115004543, 1676196055) 


and 
Pz = (1267572925, 848156341) mod n 


Then x2 — x; = 152568382 and gcd(152568382, n) = 26927. Hence 26927 divides 
n and we find that 
n = 26927 - 63719. 


We now consider the prime number certification using the method of 
Goldwasser—Killian. The idea behind certificates of primality for a number is to 
provide an efficiently verifiable proof that n is a prime number. One approach goes 
back to H. Pocklington which was Theorem 5.3.16 and Corollary 5.3.2. We restate 
it below and then present an example using it. 


Theorem 5.3.22 (Pocklington) Let a,k,n,q € N withn — 1 = qk and q > k, and 
let the following properties be satisfied: 


I. qis aprime number, 
2. a’! =1modn, 
3. (a —1,n) =1. 


Then n is a prime number. 


EXAMPLE 5.3.4.2 We present a certification using Pocklington that 2922529 
is a prime number. Note that this number is small enough that it can be checked 
directly. We have 2922259 — 1 = 1721 - 1698 and 


2792259-1 — 1 mod 2922259 
(21698 _ 1, 2922259) = 1. 
Both facts can be checked efficiently using modular exponentiation and the Euclidean 


algorithm. If we knew that 1721 is a prime number than Pocklington’s theorem 
certifies that 2922259 is also a prime number. 
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We use the same approach for 1721. We have 1721 — 1 = 43 - 40 and 
2'7?1-! = 1 mod 1721 
CP = 1,49) = 1, 


Since 43 is a prime number it follows that 1721 is prime and finally 2922259 is 
prime. The certificate for primality now consists of all the numbers involved in the 
proof above. 

n= 43 a= 7 ay = 2 
nz = 1721 2 = 43 a= 2. 
nz = 2922259 q3 = 1721 az = 2 


The problem with primality certification using Pocklington’s method in this form 
is that it only works for numbers 7 where n — | has a large prime factor. The method 
of Goldwasser and Kilian carries Pocklington’s idea over to elliptic curves. Here, by 
choosing different curves, very many groups are available. Similar to the situation 
with the factorization problem, here this method would apply only if one can with 
high probability count on n being a prime number. 

If pis a prime there are many results concerning bounds on the order of the elliptic 
curve group |E(Z,)|. We need the following theorem of Hasse, a proof of which can 
be found in the book by J.H. Silverman [Si]. 


Theorem 5.3.23 (Hasse’s Theorem) Let F = Fy with q = p", p prime andn > | 
and let E(F) be the elliptic curve group for the ellipitc curve y* = x° +ax +b. Then 


qt+1-2/q < |E(F)| <¢+14+2/¢. 
With this we give another primality test using elliptic pseudocurves. 


Theorem 5.3.24 Letn € N and let E(Z,) be a pseudocurve over Z,. Suppose that 
1 

0 #P € E(Z,) and q > (nt + 1)? a prime number. If g» P = 0 in E(Z,), then n is 

a prime number. 


Proof Suppose that n is not a prime number. Then there exists a prime factor p of 
n with p < ./n. Let d be the order of p in the elliptic curve E(Z,). It is clear from 
the lemma that we have g- P = 0 in E(Z,). This implies that d|q. Since q is a prime 
number and d # | we obtain d = q. Thus q < |E(Z,)|. However from Hasse’s 
theorem we have 

|E(Zp)| < (n? + 1)? 


providing a contradiction. Therefore n must be prime. 


We note that this is a special case of Theorem 5.3.20. 

We now describe the algorithm of Goldwasser and Kilian. Let n € N and we want 
to prove that n is prime and further the test used provides a certificate for the primality. 
For small n this can be solved by the direct primality testing so we assume that n is 
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sufficiently large. First, by a probabilistic procedure we convince ourselves that with 
high probability n is a prime number. Then we choose a random (pseudo)curve E 
over Z,, and compute the number |E(Z,,)| under the assumption that 7 is prime (see 
[BFKR]). If this calculation is not possible, for example due division by zero, then 
n is not a prime. We keep searching until |E(Z,,)| = kq for a number q that is very 
likely a prime and satisfies 


it <q<5 


for n sufficiently small and k in some sense is small. 

Before we certify that g is a prime, we choose arandom point P = (x, y) € E(Z,). 
To do this we repeatedly choose an x € Z, at random, until a y € Z, is found such 
that y* = x? + ax +b mod n. To determine y we use one of the randomized methods 
for extracting roots in finite fields. Should the process fail, then the chances are good 
to find a proper divisor of n and thus prove that n is not prime. 

In the next step we compute P’ = kP € E(Z,,). If kP is not defined, then 7 is not 
prime. 

If KP = 0, we search for a new point P € E(Z,). 

If P’ 4 0 then P’ must have order gq, or 7 is not a prime number. 

If the computation of gP’ = 0 in E(Z,,) is successful then from Theorem 5.3.24 
we obtain that 1 is a prime number unless g is not a prime. 

Therefore, finally we apply the method recursively on input q to certify that g 
is a prime. The certificate to the primality of n consists of the parameters of E, the 
point P, the value g together with a certificate for the primality of q. If this algorithm 
yields a result then this algorithm and certificate is correct. 

A comprehensive description and discussion of elliptic curve methods can be 
found in Crandall and Pomerance [CP]. 


5.4 Cryptography and Primes 


Cryptography refers to the science and/or art of sending and receiving coded mes- 
sages. Coding and hidden ciphering is an old endeavor used by governments and mil- 
itaries and between private individuals from ancient times. Recently, it has become 
even more prominent because of the necessity of sending secure and private infor- 
mation, such as credit card numbers, over essentially open communication systems. 

In general both the plaintext message (uncoded message) and the ciphertext 
message (coded message) are written in some N-letter alphabet which is usually the 
same for both plaintext and code. The method of coding or the encoding algorithm 
is then a transformation of the N-letters. The most common way to perform this 
transformation is to consider the N letters as N integers modulo N and then perform 
a number theoretical function on them. Therefore most encoding algorithms use 
modular arithmetic and hence cryptography is closely tied to number theory. In 
this section we give a brief overview of cryptography and some number theoretic 
algorithms used in encryption. The subject is very broad, and as mentioned above, 
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very current, due to the need for publically viewed but coded messages. There are 
many references to the subject. The book by Koblitz [Ko] gives an outstanding 
introduction to the interaction between number theory and cryptography. It also 
includes many references to other sources. The book by Baumslag, Fine, Kreuzer 
and Rosenberger [BFKR] provides a further comprehensive look at mathematical 
cryptography while the book by Stinson [St] describes the whole area. 

Modern cryptography is usually separated into classical cryptography and public 
key cryptography. In the former, both the encoding and decoding algorithms are 
supposedly known only to the sender and receiver, usually referred to as Bob and 
Alice. In the latter, the encryption method is public knowledge but only the receiver 
knows how to decode. We make this more precise in Section 5.5 when we introduce 
public key methods. Here we present first the basic terminology used in classical 
cryptography. 

The message that one wants to send is written in plaintext and then converted into 
code. The coded message is written in ciphertext. The plaintext message and cipher- 
text message are written in some alphabets that are usually the same. The process of 
putting the plaintext message into code is called enciphering or encryption while the 
reverse process is called deciphering or decryption. Encryption algorithms break 
the plaintext and ciphertext message into message units. These are single letters or 
pairs of letters or more generally k-vectors of letters. The transformations are done 
on these message units and the encryption algorithm is a mapping from the set of 
plaintext message units to the set of ciphertext message units. Putting this into a 
mathematical formulation we let 


M = set of all plaintext message units and 
C = set of all ciphertext message units. 
The encryption algorithm is then the application of a left invertible function 
f:M-C. 
The function f is the encryption map. The left inverse 
g:C>M 


is the decryption or deciphering map. The triple {M, C, f}, consisting of a set of 
plaintext message units, a set of ciphertext message units and an encryption map is 
called a cryptosystem. 

Breaking a code is called cryptanalysis. An attempt to break a code is called an 
attack. Often cryptanalysis depends on a statistical frequency analysis of the plaintext 
language used (see the exercises). Cryptanalysis depends also on a knowledge of the 
form of the code, that is, the type of cryptosystem used. 

We now give some examples of cryptosystems and cryptanalysis. 
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EXAMPLE 5.4.1 The simplest type of encryption algorithm is a permutation 
cipher. Here the letters of the plaintext alphabet are permuted and the plaintext 
message is sent in the permuted letters. Mathematically if the alphabet has N letters 
and o is a permutation on |, ..., N, the letter 7 in each message unit is replaced by 
a (i). For example suppose the plaintext language is English and the plaintext word 
is BOB and the permutation algorithm is 


abcdefghijkim 
bedfghjklilnopr 


nopqrstuvwxyz 
stuwxaeizmaqyu 


then bob — ctc. 

EXAMPLE 5.4.2 A very straightforward example of a permutation encryption 
algorithm is a shift algorithm. Here we consider the plaintext alphabet as the integers 
0, 1,..., NM — 1 mod N. We choose a fixed integer k and the encryption algorithm is 


f:m—>m+kmodN. 


This is often known as a Caesar code after Julius Caesar who supposedly invented 
it. It was used by the Union Army during the American Civil War. For example if 
both the plaintext and ciphertext alphabets were English and each message unit was 
a single letter then the number of letters N is N = 26. Suppose k = 5 and we wish 
to send the message ATTACK. If A = 0 then ATTACK is the numerical sequence 
0, 19, 19, 0, 2, 10. The encoded message would then be FYYFIP. 

Any permutation encryption algorithm which goes letter to letter is very simple 
to attack using a statistical analysis. If enough messages are intercepted and the 
plaintext language is guessed then a frequency analysis of the letters will suffice 
to crack the code. For example in the English language the three most commonly 
occurring letters are E, T and A with a frequency of occurrence of approximately 
13 % and 9% and 8 % respectively. By examining the frequency of occurrences of 
letters in the ciphertext the letters corresponding to EF, T and A can be uncovered (see 
the exercises). 

EXAMPLE 5.4.3 A variation on the Caesar code is the Vigenére code Here 
message units are considered as k-vectors of integers mod N from an N letter alphabet. 
Let B = (b,,..., b,) be a fixed k-vector in Ze. The Vigenére code then takes a 
message unit 

(d1,...,Ax) > (Qa +),,...,aq,+b,) mod N. 


From a cryptanalysis point of view a Vigenére code is no more secure than a Caesar 
code and is susceptible to the same type of statistical attack. 

The Alberti Code is a polyalphabetic cipher and can often be used to thwart a 
statistical frequency attack. Originally this type of polyalphabetic cipher was devel- 
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oped by Leon Alberti about 1470. Nowadays it more commonly referred to as a 
Vigenére code after Blaise Vigenére who worked a century after Alberti. A version 
of a polyalphabetic cipher is described it in the next example. 


EXAMPLE 5.4.4 Suppose we have an N-letter alphabet. We then form an N x 
N matrix P where each row and column is a distinct permutation of the plaintext 
alphabet. Hence P is a permutation matrix on the integers 0,...,N — 1. Bob and 
Alice decide on a keyword. The keyword is placed above the plaintext message and 
the intersection of the keyword letter and plaintext letter below it will determine 
which cipher alphabet to use. We will make this precise with an 9 letter alphabet 
A, B,C, D, E, O,S,T, U. Here for simplicity we will assume that each row is just a 
shift of the previous row, but any permutation can be used. 


Key Letters 


ABCDEOSTU 


aAabcdeostu 
1 Bbcdeostua 
pCcdeostuab 
hDdeostuabe 
aEeostuabcd 
bOostuabcde 
eSstuabcdeo 
tTtuabcdeos 
sUuabcdeost 


Suppose the plaintext message is STAB DOC and Bob and Alice have chosen the 
keyword BET. We place the keyword repeatedly over the message 


BETBETB 
STABDOC 


To encode we look at B which lies over S. The intersection of the B key letter and 
the S alphabet is a f so we encrypt the S with T. The next key letter is E which lies 
over T. The intersection of the FE keyletter with the T alphabet is c. Continuing in 
this manner and ignoring the space we get the encryption 


STAB DOC — TCTCTDD 


EXAMPLE 5.4.5 A final example, which is not number theory based, is the so- 
called Beale Cipher. This has a very interesting history which is related in the popular 
book Archimedes Revenge by P. Hoffman (see [Ho]). Here letters are encrypted by 
numbering the first letters of each word in some document like the Declaration of 
Independence or the Bible. There will then be several choices for each letter and a 
Beale cipher is quite difficult to attack. 
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5.4.1 Some Number Theoretic Cryptosystems 


Here we describe some basic number theoretically derived crytosystems. In applying 
a cryptosystem to an N letter alphabet we consider the letters as integers mod N. 
The encryption algorithms then apply number theoretic functions and use modular 
arithmetic on these integers. One example of this was the shift or Caesar cipher 
described in Example 5.4.2. In this encryption method a fixed integer k is chosen and 
the encryption map is given 


f:m—>m+kmodN. 


The shift algorithm is a special case of an affine cipher. Recall that an affine 
map on a ring R is a function f(x) = ax + b with a, b, x € R. We apply such a map 
to the ring R = Zy as the encryption map. Specifically again suppose we have an N 
letter alphabet and we consider the letters as the integers 0,1,..., (WN — 1) mod N, 
that is in the ring Zy. We choose integers a, b € Zy with (a, N) = 1 andb 4 0.a,b 
are called the keys of the cryptosystem. The encryption map is then given by 


f:m— am+bmodN 
EXAMPLE 5.4.1.1 Using an affine cipher with the English language and keys 
a = 3,b = 5 encode the message EAT AT JOE’S. Ignore spaces and punctuation. 
The numerical sequence for the message ignoring the spaces and punctuation is 
4,0, 19,0, 19,9, 14,4, 18 
Applying the map f(m) = 3m + 5 mod 26 we get 
17,5, 62,5, 62, 32, 47, 17,59 — 17,5, 10,5, 10, 6, 21, 17, 7 mod 26. 
Now rewriting these as letters we get 


EAT AT JOE’S — RFKFKGVRH. 


Since (a, N) = | the integer a has a multiplicative inverse mod N. The decryption 
map for an affine cipher with keys a, b is then 


f-!:m— a-'(m— b) mod N. 


Since an affine cipher, as given above, goes letter to letter it is easy to attack using 
a Statistical frequency approach. Further if an attacker can determine two letters and 
knows that it is an affine cipher the keys can often easily be determined and the code 
broken (see the exercises). To give better security it is preferable to use k-vectors of 
letters as message units. The form then of an affine cipher becomes 
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where here v and B are k-vectors from ZK and A is an invertible k x k matrix with 
entries from the ring Zy. The computations are then done modulo N. Since v is a 
k-vector and A is ak x k matrix the matrix product Av produces another k-vector 
from he Adding the k-vector B again produces a k-vector so the ciphertext message 
unit is again a k-vector. The keys for this affine cryptosystem are the enciphering 
matrix A and the shift vector B. The matrix A is chosen to be invertible over Zy 
(equivalent to the determinant of A being a unit in the ring Zy) so the decryption 
map is given by 
v—> A!(v—B). 


Here A~! is the matrix inverse over Zy and v is a k-vector. The enciphering matrix 
A and the shift vector B are now the keys of the cryptosystem. 

A Statistical frequency attack on such a cryptosystem requires knowledge, within 
a given language, of the statistical frequency of k-strings of letters. This is more 
difficult to determine than the statistical frequency of single letters. As for a letter to 
letter affine cipher, if k + 1 message units, where k is the message block length, are 
discovered, then the code can often easily be broken. 

EXAMPLE 5.4.1.2 Using an affine cipher with message units of length 2 in the 
English language and keys 


encode the message EAT AT JOE’S. Again ignore spaces and punctuation. 
Message units of length 2, that is 2-vectors of letters are called digraphs. We first 
must place the plaintext message in terms of these message units. The numerical 
sequence for the message EAT AT JOE’s ignoring the spaces and punctuation is as 
before 
4,0, 19,0, 19,9, 14, 4, 18. 


Therefore the message units are 
(4,0), (19, 0), (19, 9), 14, 4), (18, 18) 
repeating the last letter to end the message. 
The enciphering matrix A has determinant 27 which is 1 modulo 26 and hence is 


a unit mod 26. Therefore A is invertible and so it is a valid key. 
Now we must apply the map f(v) = Av-+ B mod 26 to each digraph. For example 


(0) +#= (7) (0) +6)-@)+G)-(). 


Doing this to the other message units we obtain 
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(25, 9), (22, 25), (5, 10), (1, 13), (9, 13). 
Now rewriting these as digraphs of letters we get 
(Z, J), (W, Z), (F, K), (B,N), J, N). 
Therefore the coded message is 
EAT AT JOE’S — ZJWZFKBNIN. 
EXAMPLE 5.4.1.3 Suppose we receive the message ZJWZFKBNJN and we 


wish to decode it. We know that an affine cipher with message units of length 2 in 
the English language and keys 


is being used. 
The decryption map is given by 


v—> A!(v— B). 


; ; ‘ . F b 
so we must find the inverse matrix for A. For a 2 x 2 invertible matrix ( ) we 


ab\' 1 d —b 
cd) ~ ad—be\-c a)’ 


Therefore, in this case, recalling that multiplication is mod 26, 


_ (51 Pee ee 
AC) = a, aa 


The message ZJWZFKBNJN in terms of message units is 


have 


(25, 9), (22, 25), (5, 10), (1, 13), , 13). 
We apply the decryption map to each digraph. For example 
-1({ (25) _ 7 7 -1\ (25\ — (5\\ _ 
s (5 BY} —8 5 9 3 sO 
Doing this to each we obtain 


(4, 0), (19, 0), (19, 9), (14, 4), (18, 18) 
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and rewriting in terms of letters 
(E, A), (T, A), (T, J), (O, E), (S, S). 


This gives us 
ZJWZFKBNJN — EATATJOESS 


5.5 Public Key Cryptography and the RSA Algorithm 


Presently there are many instances where secure information must be sent over open 
communication lines. These include for example banking and financial transactions, 
purchasing items via credit cards over the internet and similar things. This led to the 
development of public key cryptography. Roughly, in classical cryptography only 
the sender and receiver know the encoding and decoding methods. Further it is a 
feature of such cryptosystems, such as the ones that we have looked at, that if the 
encrypting method is known the decrypting can be carried out. In public key cryp- 
tography the encryption method is public knowledge but only the receiver knows 
how to decode. More precisely in a classical cryptosystem once the encrypting algo- 
rithm is known the decryption algorithm can be implemented in approximately the 
same order of magnitude of time. In a public key cryptosystem, developed first by 
Diffie and Hellman, the decryption algorithm is much more difficult to implement. 
This difficulty depends on the type of computing machinery used (much as primality 
testing) and as computers get better, new and more secure pulic key cryptosystems 
become necessary. 

The basic idea in a public key cryptosystem is to have a one-way function. That is 
a function which is easy to implement but very hard to left invert. Hence it becomes 
simple to encrypt a message but very hard, unless you know the left inverse, to 
decrypt. The standard model for public key systems is the following. We assume that 
the set M of plaintext message units is the same as the set C of ciphertext message 
units and that the decrypting map is the inverse of the encrypting map. Alice wants 
to send a message to Bob. The encrypting map f,4 for Alice is public knowledge as 
well as the encrypting map fg for Bob. On the other hand the decryption algorithms 
ga and gg are secret and known only to Alice and Bob respectively. Let m be the 
message Alice wants to send to Bob. She sends fgg4(m). To decode Bob applies 
first gg, which only he knows. This gives him ga(fgga(m)) = ga(m). He then looks 
up fa which is publically available and applies this f4(g4(m)) = m to obtain the 
message. Why not just send fg(m). Bob is the only one who can decode this. The 
idea is authentication, that is being certain from Bob’s point of view that the message 
really came from Alice. Suppose p is Alice’s verification; for example her signature, 
social security number etc.. If Bob receives fg (p) it could be sent by anyone since fp is 
public. On the other hand since only Alice supposedly knows ga getting a reasonable 
message from g4(gafpfa(m)) would verify that it is from Alice. Applying gg alone 
should result in nonsense. 
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Getting a reasonable one-way function can be a formidable task. The most widely 
used (at present) public key systems are based on difficult to invert number theoretic 
functions. Diffie-Hellman in 1976 developed the original public key idea using the 
discrete log problem. In modular arithmetic it is easy to raise an element to a power 
but difficult to determine, given an element, if it is a power of another element. 
Specifically if G is a finite group, such as the cyclic multiplicative group of Z, where 
p is a prime, and h = g* for some k then the discrete log of h to the base g is any 
integer t with h = g'. The rough form of the Diffie-Helman public key system is 
as follows. Bob and Alice will use a classical cryptosystem based on a key k with 
1 <k <q-—1 where q is a prime. It is the key k that Alice must send to Bob. Let 
g be a multiplicative generator of Z7 Alice chooses ana € Z, with 1 <a<q-1. 
She makes public g*. Bob chooses an element b € Zi and makes public g’. The 
secret key is g”. Both Bob and Alice, but presumably no one else, can discover this 
key. Alice knows her secret power a and the value g? is public from Bob. Hence 
she can compute the key g”” = (g’)“. The analogous situation holds for Bob. An 
attacker however only knows g and g’. Unless the attacker can solve the discrete 
log problem, that is finding the base g, the key exchange is secure. 

In 1977 Rivest, Adelman, and Shamir developed the RSA Algorithm which is 
presently one of the most widely used public key cryptosystems. It is based on the 
difficulty of factoring large integers and in particular on the fact that it is easier to 
test for primality (hence the inclusion in this chapter) than to factor. It works as 
follows. Alice chooses two large primes pa, ga and an integer e, relatively prime to 
0(paga) = (pa — 1)(ga — 1). It is assumed that these integers are chosen randomly 
to minimize attack. The primality tests arise in the following manner. Alice first 
randomly chooses a large odd integer m and tests it for primality. If its prime it is 
used. If not, she tests m+ 2, m-+4,... and so on until she gets her first prime p4. She 
then repeats the process to get g4. Similarly she chooses another odd integer m and 
tests until she gets an e, relatively prime to ¢(paqa). The primes she chooses should 
be quite large. Originally RSA used primes of approximately 100 decimal digits, but 
as computing and attack have become more sophisticated, larger primes have had to 
be utilized. We will say more of this shortly. Once Alice has obtained pa, ga, ea she 
lets 24 = paga and computes d,, the multiplicative inverse of e, modulo $(n,). That 
is dg Satisfies e4d, = 1 mod (p, — 1)(ga — 1). She makes public the enciphering 
key K4 = (ng, eg) and the encryption algorithm known to all is 


fam) = m® mod ng 
where m € Z,, is a message unit. It can be shown that if 
(ea, (Pa — 1)(ga — 1)) = Land ead, = 1 mod (pa — 1)(ga — 1) 
then m%“ = m mod ng (see the exercises). Therefore the decryption algorithm is 


ga(c) = c™ mod nq. 
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Notice then that g4(f4(m)) = m4 = m mod ng 80 it is the inverse. 

Now Bob makes the same type of choices to obtain pz, gz, eg. He lets ng = pags 
and makes public his key Kg = (ng, ep). 

If Alice wants to send a message to Bob that can be authenticated to be from 
Alice she sends fg(ga(m)), where here we assume that n4 > ng. An attack then 
requires factoring n4 or ng which is much more difficult than obtaining the primes 
PA, WA; PB, QB- 

In practice suppose there is an N letter alphabet which is to be used for both 
plaintext and ciphertext. The plaintext message is to consist of k vectors of letters 
and the ciphertext message of / vectors of letters with k < /. Each of the k plaintext 
letters in a message unit m are then considered as integers mod N and the whole 
plaintext message is considered as a k digit integer written to the base N (see the 
example below). The transformed message is then written as an / digit integer mod NV 
and then the digits are then considered integers mod N from which encrypted letters 
are found. To ensure that the range of plaintext messages and ciphertext messages 
are the same, k < /, are chosen so that 


NE <ny <N! 


for each user U, that isny = pyqu. In this case any plaintext message m is an integer 
less than N* considered as an element of Zn,- Since ny < N the image under the 
power transformation corresponds to an / digit integer written to the base N and 
hence to an / letter block. We give an example with relatively small primes. In real 
world applications the primes would be chosen to have over a hundred digits and the 
computations and choices must be done using good computing machinery. 

EXAMPLE 5.4.2.1 Suppose N = 26,k = 2 and / = 3. Suppose further that 
Alice chooses pa = 29, qa = 41, eg = 13. Here ng = 29 - 41 = 1189 so she makes 
public the key K, = (1189, 13). She then computes the multiplicative inverse d4 of 
13 mod 1120 where 1120 = 28 - 40. Now suppose we want to send her the message 
TABU. Since k = 2 the message units in plaintext are 2 vectors of letters so we 
separate the message into TA BU. We show how to send TA. First the numerical 
sequence for the letters TA mod 26 is (19,0). We then use these as the digits of a 
2-digit number to the base 26. Hence 


TA =19-26+0-1 = 494. 
We now compute the power transformation using her e, = 13 to evaluate 
f(19, 0) = 4948 mod 1189. 


This is evaluated as 320. Now we write 320 to the base 26. By our choices of k, / 
this can be written with a maximum of 3 digits to this base. Then 


320 = 0-267 + 12-2648. 
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The letters in the encoded message then correspond to (0, 12, 8) and therefore the 
encryption of TA is AMI. 

To decode the message Alice knows dy, and applies the inverse transformation. 

Since we have assumed that k < / this seems to restrict the direction in which 
messages can be sent. In practice to allow messages to go between any two users the 
following is done. Suppose Alice is sending an authenticated message to Bob. The 
keys ka = (ng, ea), kp = (np, ep) are public. If m4 < ng Alice sends fgga(m). On 
the other hand if m4 > ng she sends gafg(m). 

The computations and choices used in real-world implementations of the RSA 
algorithm must be done with computers. Similarly, attacks on RSA are done via 
computers. As computing machinery gets stronger and factoring algorithms get faster, 
RSA becomes less secure and larger and larger primes must be used. In order to 
combat this, other public key methods are in various stages of ongoing development. 
RSA and Diffie-Hellman and many related public key cryptosystems use properties 
in abelian groups. In recent years a great deal of work has been done to encrypt and 
decrypt using certain nonabelian groups such as linear groups or braid groups (see 
[AG] or [BFX]). Complete treatments of group based cryptography can be found in 
the books [St], [MSU] and [BFKR]. 


5.6 Elliptic Curve Cryptography 


In Section 5.3.4 we discussed elliptic curves and how they can be utilized in primality 
testing. Here we show how they can be used effectively in public key cryptography. 
The standard public key systems that we have described so far, the Diffie-Hellman 
and RSA systems, require very large key spaces. In an attempt to use the same 
ideas but reduce the key space size it was suggested that the Diffie-Hellman method 
be applied to other abelian groups. To accomplish this, algebraic geometry was 
introduced into cryptography. In 1985, Neil Koblitz, and independently Victor Miller, 
suggested the use of elliptic curves over finite fields, and their corresponding groups, 
as possible cryptographic platforms. These methods have been quite successful and 
result, in many cases, in faster encryption and smaller key spaces than standard RSA 
methods. First we describe a basic encryption method developed by ElGamal. 

In 1984, T. ElGamal devised a method to turn the Diffie-Hellman key exchange 
protocol into a public key encryption protocol. This is now known as ElGamal 
encryption. The basic scheme for an ElGamal encryption system is the following. 
Given a large prime p there is a fixed efficiently invertible procedure to encrypt a 
plaintext into residue classes within Z;, the unit group within Z,. Let g be a generator 
for the cyclic group Z;. 

For each message transmission the user’s public key is (p, g, A) where A = g* 
for some integer a. 

The encryption and decryption works as follows. Suppose that Bob wants to send 
a message to Alice. Alice’s public key, which is public knowledge, is (p, g, A) as 
above. The message is m and, as above, is encrypted in some workable efficient 
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manner within Z,, that is, the message is encrypted in a manner known to all users 
(once p is given) as an integer in 0, 1, ..., »— 1. Bob now randomly chooses an integer 
b and computes B = g’. He now sends (B, mC) to Alice where C = g*”. Notice 
that C is the common shared key in the Diffie-Hellman key exchange and in the 
encryption this is multiplied by the message m. 

To decrypt Alice first uses B to determine the common shared key C. Since B = g’ 
and she knows A = g“ she knows C = g”” for the same reasons as the Diffie-Hellman 
key exchange works. Since she knows C = g“ and she knows the modulus p she can 
compute the inverse g~@. This is efficient since it only requires one exponentiation 
modulo p. She then multiplies mC = mg” by g~@? to obtain the message m. 

Although ElGamal proposed using the cyclic groups Z; for large primes p, this 
type of encryption can be used in any cyclic group where the discrete log problem is 
assumed hard. If the group is a cyclic subgroup within the group of an elliptic curve, 
ElGamal encryption becomes the basis for elliptic curve cryptography. 

We assume the material on elliptic curves described in Section 5.3.4 and apply the 
ElGamal method to the group of an elliptic curve to obtain the elliptic curve cryp- 
tosystem. We restrict ourselves to odd prime numbers p > 5 and the corresponding 
finite fields Z,. 

Consider the elliptic curve (in Weierstrass form) over Z, given by 


y=x+axt+b,a,beZ, 


with A = —4a? — 27b? 4 0inZ,. 
Now let 
E(Z») = {(%, y) € Zp X Zp; y* = x3 + ax + BYU {0} 


be the elliptic curve group of E(Z,). The basic idea is to use the ElGamal method, 
and its dependence on the corresponding discrete log problem, in E(Z,), this is, given 
PéE(Z,),P £0, and nP € E(Z,) find n. 

We now define the elliptic curve encryption scheme which we will abbreviate 
by ECES. This is also known as the Elliptic Curve ElGamal Cryptosystem or 
the Meneses-Vanstone Cryptosystem. This is in general, efficient to encrypt, and 
requires a smaller keyspace than the RSA method. 


ECES Preparation 


1. Choose a large odd prime p with p > 5 and a, b € Z, such that 
y=rt+axt+b 


is an elliptic curve. 
2. Choose an injective efficiently invertible (on the image) map p : M — E(Z,)\{0} 
from the set M of plain text units to E(Z,). We describe such a choice below. 
. Choose a point P 4 0 in E(Z,) 
4. Choose a secret integer d € Z and calculate dP € E(Z,) 


Ww 
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Encryption and Decryption in ECES 


1. The public key is (P, dP) with P 4 0, P € E(Z,) x E(Z,) and the elliptic curve 
itself. The secret key is d. 

2. Encryption: Let m €¢ M be a plain text message unit. Calculate Q = p(m). 
Choose a random integer k € Z and define 


c= (kP,Q+k(dP)) € E(Zp))* =C. 


This is the encrypted message unit. 

3. Decryption: Let c = (c), c2) € C be a ciphertext unit. Calculate Q = cz — dc; 
and m = p~'(Q) the preimage of Q. 
Recall that Q € E(Z,) if Q = p(m) and (cy, cz) = (kP, 0+ k(dP)). 


Theorem 5.6.1 ECEC provides a valid cryptosystem. 


Proof Let (ci, cz) = (kP, Q+ k(dP)). Then cz — dc} = Q = p(m). 


Notice that if the discrete log problem for E(Z,) is solvable, that is, if we can 
calculate d from (P, dP) then the ECES is broken. 

We now show how to construct an injective, efficiently invertible map M — 
E(Zp) \ {0}. 

Let y? = x? +ax +b be an elliptic curve over Zp with p > 5. We have by Hasse’s 
Theorem (see [Sil]) 


|E(Zp)| € [p+ 1—2./p,p+1+2/p]NN. 


There are efficient probabilistic algorithms to generate points of E(Z,) (see 
[BFKR]). We need many points in E(Z,). 


1. Choose k € N such that the permitted probability of an error is < x 
2. Let M = {0, 1, ..., M}. We should have p > (M + 2)k. 
3. Define an injective map: 


Ww: M x {1,...,k} > Zp by (m, j) > mk +j. 


Recall that 0 < mk +j < p because p > (M + 2)k. 

4. Letx = V(m, 1). Calculate f(x) = x? + ax + b and check if there exists yeZ, 
with y* = f(x). If this is the case then choose y so that y € {0, 1, ..., poy and 
define p(m) = (x, y). 

We note that f(x) is a quadratic residue modulo p for about half of the f(x), with 
f (x) # 0, and x € Z, gives 0, 1 or 2 points on the elliptic curve. 
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5. If x = W(m, 1) and there is no y € Z, with y = f(x) then try x = W(m, 2), 
x = W(m, 3) and so on. 
With probability > 1 — x we find an element x € {W(m, 1), ..., U(m, k)} with 
f(x) = y’ for some y € Zp. 
If j with 1 <j <k is the smallest integer j such that x = W(m, j) and f(x) = y* 
for some y € Z, - such aj exists with probability > 1 — x - then choose 
y € {0,1,..., 25*} and define p(m) = (x, y). 

6. If (x, y) € Im(p) C E(Z,) then we may recover m efficiently. If x = mk +j then 
m= a because k € N and p > (M + 2)k. 


There has been extensive work on the cryptanalysis of ECES. We mention some 
general ideas and refer to [BFKR] for more information. 

Recall that £(Z,) and P are public. An attacker has to calculate |E(Z,)| or |P| the 
order of P in E(Z,). 


1. ECES is not secure if |E(Z,)| has only small prime factors. Hence |E(Z,)| should 
have at least one large prime factor (see[BFKR]) 

2. Analogously |P| should have at least one large prime factor. 

3. ECES is not secure if |P| = p. Here we can determine |E(Z,)| effectively via the 
trace t,t = q + 1 — |E(Z,)|, of the Frobenius map using what is called Schoof’s 
algorithm (see [Sc 1,2]). 


Elliptic curves that have passed all known attacks so far can be found at the website 
http://www.ecc-brainpool.org/ecc-standards.htm. 


5.7. The AKS Algorithm 


The development of the AKS algorithm and the fact that it is of polynomial time 
is the major most recent theoretical breakthrough in primality testing. Because of 
the timeliness and relative simplicity of the proof we here reproduce the arguments 
in the original paper of Agrawal, Kayena, and Saxena [AKS]. There have already 
been substantial improvements (see [Bo], [Be]), however, the elegance of the original 
stands out. For the most part, this section, with some explanatory material, is taken 
directly from their paper. We first need the following notation. If p(x), g(x) are 
integral polynomials then we say 


p(x) = q(x) mod (x” — 1, n) 
if the remainders of p(x) and q(x) after division by x” — | are equal (equal coefficients) 


modulo n. Further if p is a prime o,(r) is the multiplicative order of r mod p. Two 
further number theoretic results are needed. 
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Lemma 5.7.1 ({Fou85], [BH96]) Let P(n) denote the greatest prime divisor of n. 
Then there exist constants c > 0 and no such that for all x > no 


x 


l{p; p prime p < x and P(p— 1) > x3} >c 
log, x 


Lemma 5.7.2 ([A]) If (x) is the standard prime number function then for n > 1, 


8n 


“_ < x(n) < 
6 log, n log, n 


We now restate the AKS algorithm as given in [AKS]. 


AKS Algorithm Program: Input an integer n > 1. 
1: If n = a? for some natural numbers a, b with b > 1 then output COMPOS- 
ITE. 
237 = 2 
3: while (r < n) do { 
4: if ((n, r) € 1) output COMPOSITE 


5 if (r is prime ) 

6: let q be the largest prime factor of r — 1 
Te if (¢ = 4,/r log, n) and (n't 4 1) modr 
8 break; 

9 r<rt+l 


11: for a = 1 to [2,/r log, n] 

12: if (x — a)" is not congruent to x” — a mod (x” — 1, n) output 
COMPOSITE; 

13: output PRIME; 


The proof by Agrawal, Kayena, and Saxena is in two parts. The first establishes 
that the algorithm is deterministic. That is the algorithm will return PRIME if and 
only if the inputted integer is a prime. The second part shows that the algorithm is 
polynomial in log, n the number of binary digits of n. The remainder of this section 
is taken from the original paper [AKS]. 


Theorem 5.7.1 (AKS) The AKS algorithm returns PRIME if and only if n is prime. 


The proof is established by a series of lemmas. The first lemma bounds the number 
of iterations in the while loop. This loop attempts to find a prime r such that r — | 
has a large prime factor g > 4,/r log, n and q|o,(n) where 0,(n) is the multiplicative 
order of n mod r. 


Lemma 5.7.3 There exist positive constants c,, C2 for which there is a prime r 
in the interval [c\ log, n)®, co (log, n)°] such that r — 1 has a prime factor q with 
q = 4\/Fr log, n and q|o,(n). 
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Proof Let c and P(n) be as in Lemma 5.5.1. For any cy, c2 call the primes r in the 
interval [c; (logy n)°, co (log, n)°] that satisfy P(r — 1) > (co log, n)® )3 > 7 special 
primes. Then for n large enough the number of special primes is greater than or equal 
to 

number of special primes in [1, cz (log, n)°| — number of primes in [1, c; (log, n)°]. 


Using Lemmas 5.7.1 and 5.7.2 then this value is greater than or equal to 


cC2 (log, n)® 8c, (log, n)® _ (log, n)y® cen. 8cy 


Tlog,log,n 6log,log,n log, log,n° 7 6 


Now choose the constants c, > 4° and c> so that oa _ Be > 0. Call this positive 
value c3. 
Let x = c3(log, n)°. Consider the product 


1 
P= (n—1)(n* —1)--- (n®*! - 1). 
This product has at most x3 log, n different prime factors. Note that 


6 
x3 2 Jog, n = C3 (log, n) : 
log, log, n 


It follows that there is at least one special prime, say r, that does not divide the 
product P. This is the required prime in the Lemma 5.7.3. r — | has a large prime 
factor g > r3 > 4,/rlog,n since c, > 4° and g|o,(n). 


Lemma 5.7.4 /fn is prime the AKS algorithm returns PRIME. 


Proof Suppose that n is a prime. Then the while loop in the algorithm cannot return 
COMPOSITE since (n, r) = | for all r < co (log, n)® where cp is the constant from 
Lemma 5.7.3. Since f(x)? = f(x”) mod p for any integral polynomial, the for loop 
in the algorithm also cannot return COMPOSITE. Hence the algorithm will identify 
nas PRIME. 


It must be shown now that if m is composite then the algorithm will return COM- 
POSITE. Suppose that n is composite with the different prime factors p1,..., Px. 
Let r be the prime found in the while loop as in Lemma 5.7.3. Then in this case 
0,(n)|lcm(o;(p1), .--, Or(px)) and hence there exists a prime factor p of n such that 
qlo,(p) with g the largest prime factor of r — 1. Let p be such a prime factor of n. 

The bottom loop in the program uses the value of r to do polynomial computations 
on the t = [2,/r log, n] polynomials x — a for 1 < a < t. In the finite field Z, the 
polynomial x” — | has an irreducible factor h(x) of degree d = 0,(p). Now 


(x — a)" = (x” — a) mod (x — I, n) 
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implies that 
(x — a)" = (x" — a) mod (h(x), p). 


It follows that the polynomial identities on the set of (x — a) hold in the quotient 
field Z,[x]/(A(x)). The set of (x — a) form a large cyclic group in this field. 


Lemma 5.7.5 In the field F = Z,[x]/(h(x)) the group G generated by the t poly- 
nomials (x — a) with 1 < a < t is cyclic and of size > (4)! where d is the degree of 
h(x). 


Proof Recall that the multiplicative group of a finite field is cyclic. Since F is finite 
and G is a multiplicative subgroup of F it follows that G is also cyclic. What must 
be shown is the size. 

Consider the set 


s={[[@-a@™: >, Qqg <d—1, a4 > 0, forall 1 <a < 4}. 


l<a<t l<a<t 


The while loop ensures that the final r on halting satisfies r > q > 4,/rlog, 
n > t. If any of the a’s are congruent mod p thenp < t < randstep 4 of the algorithm 
identifies n as composite. Therefore any two elements of S are distinct modulo p. 
This implies that all elements of S are distinct in the field F = Z,[x]/(h(x)) since 
the degree of an element of S is less than d, the degree of h(x). 

The cardinality of S is then 


ay 
Pigs 


ie (t+d—1)\(t+d—2)---d 
t ~ t! 


Since S is a subset of G this gives the desired result. 


Since d > 2r the size of G is > 2' = n°¥". From the previous lemma G is cyclic. 
Let g(x) be a generator of G. The order of g(x) in F is then > nv" Let 


Igoe) = {m; g(x)” = g”") mod (x" — 1, p)}. 


Lemma 5.7.6 The set I,(,) is closed under multiplication. 


Proof Let m,, mz € IgQ.). Then 
g(x) = g(x") mod (x’ — 1, p) 


and 
g(x)” = g(x”) mod (x" — 1, p). 


Substituting x” for x in the second congruence we get 
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gal)” = g(x") mod (x" — 1, p). 
From these it follows that 


g(x)" = g(x") mod (x _ 1, p) 


and hence mmy € Ix). 


Lemma 5.7.7 Let 0, be the order of g(x) in F. Let m,,mz € Ig¢x). Then m, = mz 
mod r implies that m, = mz mod oy. 


Proof Sincem, = m2 modr we have m2 = m, +kr forsomek > 0. Since my € Ig¢), 
taking congruences in F = Z,[x]/(h(x)), we get 


g(x)" = g(x") mod (x’ — 1, p) 
=> g(x)” = gr”) 
= say = gor") 
=> g(x) ga)" = ga)”. 


Now g(x) not congruent to 0 implies that g(x)" is not congruent to 0 and hence 
it has a multiplicative inverse in F’. Canceling it from both sides of the congruence 
above gives 

soo" = 1, 


Therefore 
kr =O modo, => m, =m mod og. 


Lemma 5.7.8 [fn is composite the AKS algorithm will return COMPOSITE. 


Proof Suppose that n is composite and suppose that the algorithm returns PRIME. 
We show a contradiction. The for loop ensures that for all 1 < a < 2,/r log, n, 


(x — a)” = (x" — a) mod (x’ — 1, p). 


The polynomial g(x), the generator of G, is a product of powers of t polynomials 
(x — a) with 1 < a < tf all of which satisfy the above equation. Thus 


g(x)" = g(x") mod (x” — 1, p). 
Therefore n € I,(,). Further p € I,(,, and 1 € I,(,). We show that J,(,) has too many 


numbers less than 0, contradicting Lemma 5.7.7. 
Consider the set 
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E = {n'p,0<i,j <[Vr}. 


By Lemma 5.7.6, E C Jy). Since |E| = (1 + [fr )D* > r there are two elements 
np" and n2p? in E with i, ¢ iz or j) 4 j2 such that 


n' p!' = n®p? modr 
by the pigeonhole principle. Then from Lemma 5.7.7 
n'p!' = n®p/* mod Og. 


This implies 
ni? = p?! mod o,. 


Since 07 > n?V" and nl'-2l < nv" and pl21| < n?¥" the above congruence 
becomes an equality. Since p is prime this equality implies n = p* for some k > 1. 
However in step | of the algorithm composite numbers of the form p* for k > 2 have 
already been detected. Therefore n = p a contradiction. 


This establishes that the AKS algorithm is deterministic and completes the proof 
of Theorem 5.7.1. 

The final theorem calculates the time complexity of the algorithm. For further 
details see [AKS]. 


Theorem 5.7.2 The asymptotic time complexity of the AKS algorithm is O((log, n)'? 
f dog, log, n)) where f is a polynomial. 


Proof Let O(t(n)) stand for O(t(n) poly(log,(t(n)) where t(1) is some function of 
n and poly means polynomial in the argument. In this notation the theorem says that 


the time complexity is O((log, n)!”). The first step in the algorithm has asymptotic 
time complexity O(log, n)? while the while loop makes O(log, n)° iterations. 

The first step in the while loop, the gcd computation, takes poly(log, log, r) 
asymptotic time. The next two steps in the while loop would take at most 
73 Poly (logs 108") in brute-force implementation. The next three steps take at most 
poly(log, log, n) steps. Thus the total asymptotic time taken by the while loop is 


O((r3°%")) = O(log n)?) 
The for loop does modular computation over polynomials. If repeated squaring 
and Fast-Fourier Multiplication is used then one iteration of the for loop takes 


O(log, n-r log, n) steps. Thus the for loop takes asymptotic time O(r3 (log, n)?) = 
O((log, n)!*). 


As pointed out in [AKS] in practice the algorithm should actually work much 
faster. This is due to the relationship to an older conjecture involving what are called 
Sophie Germain primes. If both r and — are primes then — is a Sophie Germain 
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prime and r is a co-Sophie Germain prime. In this case P(r — 1) = —s It has 
been conjectured that the number of co-Sophie Germain primes is asymptotic to 
ie 3 where D is the twin prime constant (see Section 5.2.1). It has been verified 


for r < 10!°. If the conjecture is true then the while loop exits with an r of size 


O((log, n)*) taking the overall complexity to O(log, n)°). 


5.8 Exercises 


5.1 Use trial division to determine which if any of the following integers are prime. 
(a) 10387 =(b) 269_~—s (c) 46411 


5.2 Use the Sieve of Eratosthenes to develop a list of primes less than 300. (Note 
this list could be used for Problem 5.1). 


5.3 Use the modified Sieve of Eratosthenes to find the integers less than 100 and 
relatively prime to 891. 


5.4 Apply Legendre’s formula to evaluate 
(a) Noss(200) (b) Ngo1 (100) 


5.5 Let P(x) denote the number of primes p < x for which p + 2 is prime. Then 
by Lemma 5.2.4 for x > 3 we have 


P(x) < c—— (In Inx)? 


aay 


where c is a constant. Show that this implies that for x > 3 


P(x) <k : 
(In x)? 


where k is a constant. 
5.6 Use the integral test for infinite series to show that 


0° 
r=1 


rdn(r + 1)? 


converges. 
5.7 Prove that 


_yym+l n _4yym n—1 m+1 1 
a) (4 i)t! v("7')= ae Ea) 
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5.8 Use the Fermat probable prime test to determine if 42671 is prime or not. 

5.9 Use the Lucas test to establish that 271 is prime 

5.10 Show that if n is prime and k 4 0, n 4 0 then the binomial coefficient (7) is 
congruent to 0 mod n. 

5.11 Use Problem 5.10 to show that if p is prime then 


Pig : 
(x — a)? =x? —ainZ,. 


5.12 Determine the bases b (if any) 0 < b < 14 for which 14 is a pseudoprime to 
the base b. 

5.13 Prove Lemma 5.3.1: If n is a pseudoprime to the base b, and also a pseudo- 
prime to the base b> then it is a pseudoprime to the base b)b2. 

5.14 Show that 561 = 3- 11 - 17 is the smallest Carmichael number. (Use the 
Korselt criterion together with Corollary 5.3.1). 

5.15 Define the sequence (S,,) inductively by 


S; =4 and S, = S?_, —2. 


Let u = 2— 4/3, = 2+ /3. Show thatutv=4= S; and uv = 1. Then use 
induction to show that 
s = Wo os ye" : 


5.16 Let F,, = 27’ + 1 be the nth Fermat number. Show that (=) = —1 where 
(#) is the Jacobi symbol. 

5.17 Show that if p, g are primes and e, d are positive integers with (e, (p — 1) 
(q — 1)) = 1 and ed = 1 mod (p — 1)(q — 1) then a = a mod pg for any integer 
a. (This is the basis of the decryption function used in the RSA algorithm.) 

5.18 The following table gives the approximate statistical frequency of occurrence 
of letters in the English language. The passage below is encrypted with a simple 
permutation cipher without punctuation. Use a frequency analysis to try to decode 
it. 


letter frequency letter frequency letter frequency 


A 082 B 01S C027 
D 04 #+;%E 4127 =F 022 
G 020 H 061 JI 070 
J 002 K 008 L040 
M 024 N 067 O- 075 
P 019 QO 001 R  .060 
S 06 T 09 U 028 
Vo 010 W 023. =X ~~ 001 
Y 020 Z 001 
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ZKIRNVMENY VIRHZKLHRGREVRMGVTVIDSR 
XSSZHZHGHLMOBKLHRGREV WRERHLIHLMVZ 
MWRGHVOUKIRNVMENYVIHKOZBZXIFXRZOI 
LOVRMMENYVIGSVLIBZMWZIVGS VYZHRHUL 
IGHSHVMLGVHGSVIVZIVRMURMRGVOBNZMB 
KIRNVHZMWGSVBHVIEVZHYFROWRMTYOLXP 
HULIZOOGS VKLHRGREVRMGVTVIH 


5.19 Encrypt the message NO MORE WAR using an affine cipher with single 
letters keysa = 7,b=5. 


5.20 Encrypt the message NO MORE WAR using an affine cipher on 2-vectors 
of letters and an encrypting keys 


5.21 What is the decryption algorithm for the affine cipher given in the last prob- 
lem. 

5.22 How many different affine enciphering transformations are there on single 
letters with an N letter alphabet. 

5.23 Let N € N with N > 2 andn > an+b with b # 0, (a,N) = 1 and 
(a — 1, N) = 1. Show that there is always a unique fixed letter. (This can be used in 
cryptanalysis.) 

5.24 Let N € N with N > 2 andn > an+b with (a, N) = 1 is an affine cipher 
on an N letter alphabet. Show that if any two letters are guessed ny > m, no > m2 
with (nm, — n2, N) = | then the code can be broken. 


Chapter 6 
Primes and Algebraic Number Theory 


6.1 Algebraic Number Theory 


The final major area within the theory of numbers is algebraic number theory. In 
this chapter we present an overview of the major ideas in this discipline. In line with 
the theme of these notes, we will concentrate on primes and prime decompositions. 

Algebraic number theory is roughly the study of algebraic number fields, which 
are finite extensions of the rationals, and their rings of algebraic integers. We will 
define each of these concepts formally in Section 6.3. Algebraic number theory 
lies between pure abstract algebra and (elementary) number theory. It originated in 
methods to solve classical problems in number theory, such as proving Fermat’s Big 
Theorem, but evolved into an independent discipline. It is a true melding of algebra 
and number theory. Whereas in many places in these notes we used abstract algebra to 
simplify a proof or clarify an idea in elementary number theory, in algebraic number 
theory the algebraic concepts are crucial to what is being studied. In fact, the basic 
terminology and format of modern abstract algebra come from algebraic number 
theory. While the concepts of rings and fields were implicit in the work of Galois 
and Abel, it was Kronecker and Dedekind working in number theory who formally 
defined them in the modern manner. 

The starting off point for algebraic number theory was the observation, first made 
by Gauss, that unique factorization into primes is not unique to the integers. That 
is, there are other algebraic systems which also permit such unique factorizations. 
Gauss, in attempting to extend the quadratic reciprocity law, investigated the complex 
integers Z[i] = {a + bi; a, b € Z}. They are now called the Gaussian integers in 
his honor. He discovered that he could define divisibility and primes in Z[i] and that 
there was a division algorithm analogous to the division algorithm in the ordinary 
integers Z. From this he derived that in Z[i] there was unique factorization into 
primes—of course primes in Z[i]. We will discuss the Gaussian integers in detail in 
Sections 6.2 and 6.3. 
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Kummer, who studied with Gauss, extended these investigations to complex inte- 
gers, which was Kummer’s terminology, of the form 


—1 
do + ayw +--+ + ap—jw? ‘i 


where a; € Z and w is a primitive pth root of unity where p is a prime. That is w is 
a root of the polynomial equation x”? — 1 = 0 with x ¥ 1. His original motivation 
was an attempt to prove Fermat’s Last Theorem for prime exponents. Kummer’s idea 
was to take x? + y? and factor it into 


xP + yP= (x+y) +uwy)-- tw 'y). 


Kummer defined divisibility and primes for the sets of complex integers. However 
it became clear that for some primes p they did not satisfy unique factorization. 
We will give an example to show this in the next section. To alleviate this problem, 
the lack of unique factorization, Kummer adjoined to his sets of complex integers 
certain other complex numbers which he called ideal numbers. By allowing these 
ideal numbers, there was unique factorization. This allowed him to actually settle 
many cases of Fermat’s Last Theorem for prime exponents. 

Dedekind, another student of Gauss, extended both Gauss’ work on the Gaussian 
integers and Kummer’s ideal numbers. Dedekind introduced the idea of an algebraic 
integer which is defined as a complex number that is a root of a monic polynomial 
with integral coefficients. That is 9 € C is an algebraic integer if p(#) = 0, where 


p(x) =x" +a,x" !+---+a9, ,n> 1,4; €Z. 


Each integer m is of course an algebraic integer satisfying the polynomial p(x) = 
x — m. In this context the ordinary integers are called the rational integers. Dedekind 
introduced the definition of a ring and showed that the set of algebraic integers forms 
a ring. Further he showed that the algebraic integers within each algebraic number 
field form a ring within that number field. We will discuss algebraic integers in 
Section 6.4. 

To handle unique factorization, Dedekind worked not with the algebraic integers 
themselves, but with special subrings of algebraic integers that he called ideals in 
honor of Kummer’s ideal numbers. He then showed that he could define divisibility 
and primes for ideals and then that there was unique factorization of ideals. The 
concept of an ideal in a ring is now fundamental in abstract algebra. We will dis- 
cuss general ideals in the next section and then ideals in algebraic number rings in 
Section 6.5. 

Finally Kronecker, a student of Kummer, developed a general theory of fields 
and algebraic numbers over a field. By considering polynomial rings over a general 
field he showed, given an irreducible polynomial, that it was always possible to 
construct a field where this polynomial has a root. This is done by adjoining the root 
to the original field. This is now known as Kronecker’s Theorem. It was implied 
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in the work of Abel and Galois done earlier but Kronecker’s Theorem is now the 
cornerstone of Galois Theory. 

We begin our overview of algebraic number theory by looking at unique factor- 
ization. 


6.2 Unique Factorization Domains 


The true beginning point for the theory of numbers was the Fundamental Theorem 
of Arithmetic which said that any rational integer could be factored into primes and 
that this factorization is unique up to ordering and unit factors. Algebraic number 
theory begins with the observation that this property is not unique to Z but actually 
holds in many other integral domains. We start by reviewing some basic concepts 
from abstract algebra that were introduced in Chapter 2. 

Recall that an integral domain R is a commutative ring R with an identity and 
with no zero divisors. That is, R has the property that if ab = 0 witha, b € R then 
either a = 0 or b = 0. It is clear that the integers Z form an integral domain. A unit 
in an integral domain is an element u with a multiplicative inverse, that is, there exists 
an element, u;, which we denote by u-' such that u-u~! = 1. It is easy to show 
that the product of two units is again a unit and hence the set of units in an integral 
domain forms a group under multiplication (see Chapter 2 and the exercises). A field 
F is an integral domain where every nonzero element is a unit. The rationals Q, the 
reals IR, and the complex numbers C all form fields. 

Two elements r;, rz in an integral domain R are associates if there exists a unit 
u such that 7; = ur. We now extend to any integral domain the ideas of divisibility 
and primes. 


Definition 6.2.1 Let R be an integral domain. If r},r2 € R then r; divides rz, 
denoted r,|rz, if there exists an r3 © R such that rz = 1r,r3. In analogy with the 
integers, the elements r,, 13 are factors of r2 and r,r3 is a factorization of rz. An 
element r € R is a prime if r is not a unit and whenever r = rrz one factor must 
be a unit. 


We now use the statement of the Fundamental Theorem of Arithmetic to define a 
unique factorization domain. 


Definition 6.2.2 An integral domain R is a unique factorization domain or UFD 
if for eachr € R then either r = 0, r is a unit or r has a factorization into primes 
which is unique up to ordering and unit factors. This means that if 


r= Pi-** Pm = 1°" Wks 


where the p; and qj; are primes, thenm = k and each pj is an associate of some qj. 


Hence in this more general algebraic language the Fundamental Theorem of Arith- 
metic states that the integers Z are a unique factorization domain. However they are 
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the only one. The complex integers, Z[i], are also a UFD. We will look at these in 
the next section. As a first example we show that the ring of polynomials over any 
field F (which we define below) forms a UFD. 

If F is a field and n is a nonnegative integer, then a polynomial of degree n over 
F is a formal sum of the form 


P(x) = dp + ayx +++: +ayx" (6.2.1) 


with a; € F fori =0,...,n, a, 40, and x an indeterminate. A polynomial P (x) 
over F is either a polynomial of some degree or the expression P(x) = 0, which 
is called the zero polynomial and has no degree. We denote the degree of P(x) 
by deg P(x). A polynomial of zero degree has the form P(x) = ao and is called a 
constant polynomial and can be identified with the corresponding element of F. 
We also call the zero polynomial a constant polynomial and identify it with the zero 
element of F’. The elements a; € F are called the coefficients of P(x); a, is the 
leading coefficient. If a, = 1, P(x) is called a monic polynomial. Two nonzero 
polynomials are equal if and only if they have the same degree and exactly the same 
coefficients. A polynomial of degree 1 is called a linear polynomial while one of 
degree two is a quadratic polynomial. 

We denote by F[x] the set of all polynomials over F and we will show that 
F [x] becomes a unique factorization domain. We first define addition, subtraction, 
and multiplication on F[x] by algebraic manipulation. That is, suppose P(x) = 
dy + a,x + +++ + anx", Q(X) = bo + DX + +++ + bmx" then 


P(x) + O(x) = (ay £ bo) + 1 £H1)x + --: 


that is, the coefficient of x! in P(x) 4 O(x) is a; + b;, where a; = 0 fori > n and 
b; =0 for j > m. Multiplication is given by: 


P(x) Q(x) = (abo) + (aibo + agb1)x + (anbz + ayby + aybo)x? +++» + (anbm)x"™™™ 


that is, the coefficient of x! in P(x) Q(x) is (agb; + aybj_-1 +--- + ajbo). 


EXAMPLE 6.2.1 Let P(x) = 3x2 + 4x — 6 and O(x) = 2x +7 be in Q[x]. 
Then 
P(x) + O(x) = 3x7 +6x+1 


and 
P(x) O(x) = (3x7 + 4x — 6)(2x +7) = 6x? + 29x? + 16x — 42. 


From the definitions the following degree relationships are clear. The proofs are 
in the exercises. 


Lemma 6.2.1 Let P(x) £0, Q(x) 40 € F[x]. Then: 
I. deg P(x) Q(x) = deg P(x) + deg Q(x). 
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2. deg (P(x) + Q(x)) < max(deg P(x), deg Q(x)) if P(x) + Q(x) F 0. 
We next obtain the following. 


Theorem 6.2.1 Jf F is a field, then F|x] forms an integral domain. F can be natu- 
rally embedded into F |x] by identifying each element of F with the corresponding 
constant polynomial. The only units in F |x] are the nonzero elements of F. 


Proof Verification of the basic ring properties is solely computational and is left to 
the exercises. Since deg P(x) Q(x) = deg P(x) + deg Q(x) for nonzero polynomials, 
it follows that if neither P(x) 4 0 nor Q(x) € 0 then P(x) Q(x) ¥ 0 and therefore 
F [x] is an integral domain. 

If G(x) is a unit in Fx], then there exists an H(x) € F[x] with 


G(x)H(x) = 1. 
From the degrees we have deg G(x) + deg H(x) = 0 and since deg G(x) > 0, 


deg H(x) > 0. This is possible only if deg G(x) = deg H(x) = 0. Therefore 
G(x) € F. 


Now that we have F[x] as an integral domain we proceed to show that there is 
unique factorization into primes. We first repeat the definition of a prime in F[x]. 
If0 # f(x) has no nontrivial, nonunit factors (it cannot be factorized into polynomi- 
als of lower degree) then f (x) is a prime in F [x] or a prime polynomial. A prime 
polynomial is also called an irreducible polynomial. Clearly, if deg g(x) = 1 then 
g(x) is irreducible. 

The fact that F' [x] is a UFD follows from the division algorithm for polynomials, 
which is entirely analogous to the division algorithm for integers. 


Lemma 6.2.2 (Division Algorithm in F[x]) If 04 f(x),0 4 g(x) € F[x] then 
there exist unique polynomials q(x), r(x) € F[x] such that f(x) = q(x)g(x) + 
r(x), where r(x) = 0 or deg r(x) < deg g(x). (The polynomials q(x) and r(x) 
are called, respectively, the quotient and remainder. ) 


This theorem is essentially long division of polynomials. A formal proof is based 
on induction on the degree of g(x). We omit this but give some examples from Q[x]. 


EXAMPLE 6.2.2 
(a) Let f(x) = 3x4 — 6x? + 8x — 6, g(x) = 2x? + 4. Then 


3x4 —6x?+8x-6 3 
- > = 5% 6 with remainder 8x + 18. 
X 


Thus here g(x) = 3x? — 6, r(x) = 8x + 18. 
(b) Let f(x) = 2x> + 2x4 + 6x3 + 10x? + 4x, g(x) = x? + x. Then 


2x> + 2x4 + 6x3 + 10x? + 4x 


7 = 2x3 + 6x +4. 
x-+x 
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Thus here g(x) = 2x3 + 6x +4 and r(x) = 0. 

Using the division algorithm, the development of unique factorization follows in 
exactly the same manner as in Z. We need the idea of a greatest common divisor, 
or ged, and the lemmas following the definition. 


Definition 6.2.3 (1) If f(x), g(x) € F[x] with g(x) 4 0 then a polynomial d(x) € 
F [x] is the greatest common divisor, or gcd, of f(x), g(x) if d(x) is monic, d(x) 
divides both g(x) and f(x), and if d,(x) divides both g(x) and f (x) then d,(x) 
divides d(x). We write d(x) = (g(x), f(x)). If (f@), g(x)) = I, then we say that 
Ff (x) and g(x) are relatively prime. [f f(x) = g(x) = 0 then d(x) = 0 is the gcd 
of f(x) and g(x). 

(2) An expression of the form f (x)h(x) + g(x)k(x) is called a linear combina- 
tion of f (x), 9(2). 


Lemma 6.2.3. Given f (x), g(x) € F[x] with g(x) 4 0 then the gcd exists, is unique, 
and equals the monic polynomial of least degree that is expressible as a linear 
combination of f (x), g(x). 


Finding the gcd of two polynomials is done in the same manner as finding the gcd 
of two integers. That is, we use the Euclidean algorithm. Recall from Chapter 2 
that this is done in the following manner. Suppose 0 4 f(x),0 4 g(x) € F[x]. Use 
repeated applications of the division algorithm to obtain the sequence: 


f(x) = q(x)g(x) + r(x) 
g(x) = qi(x)r(x) +11 (x) 
r(x) = qo(x)ri(x) + ro(x) 


re-1(X) = Geri) (x). 


Since each division reduces the degree, and the degree is finite, this process will 
ultimately end. Let 7;,(x) be the last nonzero remainder polynomial and suppose c 
is the leading coefficient of r(x). Then c~'r,(x) is the ged. If there does not exist 
a last nonzero remainder polynomial then r(x) = 0 and g(x) is a divisor of f(x). 
In this case (f (x), g(x)) = c7! g(x), where c is the leading coefficient of g(x). We 
give an example. 


EXAMPLE 6.2.3 In Q[x] find the gcd of the polynomials 
f(x) =x? — Land g(x) = x? —2x +1 


and express it as a linear combination of the two. 
Using the Euclidean algorithm we obtain 
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x? —1= (x* — 2x 4+ 1)(x + 2) + Bx — 3), 
7 2x+1=( 3)(4 4 
x* — 2x = (3x — 3)(=x — =). 
3 3 
Therefore the last nonzero remainder is 3x — 3. Since the gcd must be a monic 


polynomial we divide through by 3 and hence the gcd is x — 1. 
Working backwards we have 


3x —3 = @° = D=@ =-244 Ne+2) 


so 


va. Le 
* ames 1) 3" 2x + 1)(x + 2) 
expressing the gcd as a linear combination of the two given polynomials. 
The next component is Euclid’s Lemma applied to the polynomial ring. 


Lemma 6.2.4 (Euclid’s Lemma) If p(x) is an irreducible polynomial and p(x) 
divides f (x)g(x), then p(x) divides f (x) or p(x) divides g(x). 


Proof The proof is identical to the proof in Z. Suppose p(x) does not divide f(x). 


Then since p(x) is irreducible, p(x) and f(x) must be relatively prime. Therefore, 
there exist h(x), k(x) such that 


F(x)h(x) + p@)k(x) = I. 
Multiply through by g(x) to obtain 
G(x) fF X)A(X) + 9) PQ)Kx) = g(x). 


Now, p(x) divides each term on the left-hand side since p(x)|g(x) f (x) and therefore 
P(x)|g(x). 


Theorem 6.2.2 [f0 4 f(x) € F[x] and f (x) is nonconstant, then f (x) has a fac- 
torization into irreducible polynomials that is unique up to ordering and unit factors. 
In other words F(x] is a UFD. 


Proof The proof is almost identical to the proof for Z, and we sketch it. We outlined 
this sketch in the exercises to Chapter 2. First we use induction on the degree of f (x) 
to obtain a prime factorization. If deg f(x) =1, then f(x) is irreducible, so suppose 
deg f(x) =n > 1. If f(x) is irreducible, then it has such a prime factorization. If 
Ff (x) is not irreducible, then f(x) = h(x)g(x) with deg g(x) < nanddegh(x) <n. 
By the inductive hypothesis, both g(x) and h(x) have prime factorizations, and f (x) 
does as well. 
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Now suppose that f(x) has two prime factorizations 


Sf (%) = pry" +++ pic ey"* = qi)" Gy”, 


where p;(x),i = 1,...,n,qj;(x), j = 1,..., ¢ are prime polynomials and the p;(x) 
also the g ; (x) are pairwise relatively prime. Consider p;(x). Then p;(x)|qi (x)! --- 
q(x)", and hence from Euclid’s lemma, p;(x)|qj;(x) for some j. Since both are 
irreducible, p;(x) = cq;(x) for some unit c. By repeated application of this argu- 
ment we get that k = t and n; = m;. Thus we have the same primes with the same 
multiplicities but perhaps unit factors, proving the theorem. 


A polynomial P(x) € F[x] can also be considered as a function 
P:FoF 


via the substitution process. If P(x) = dy) + a,x +--+ +a,x" € F[x] andt € F, 
then 
P(t)=ao+tajit+---+a,t" € F 


since F is closed under all the operations used in the polynomial. Ifr ¢ F, P(x) € 
F [x], and P(r) = 0 under the substitution process, we say that r is a root of P(x) 
or a zero of P(x). Synonymously we say that r satisfies P(x). 

Before closing this section, we further review some properties of roots of poly- 
nomials which will be essential when we deal with algebraic number fields. First we 
have an important divisibility property. 


Lemma 6.2.5 If P(x) 4 0 and c is a root of P(x) then (x — c) divides P(x), that 
is, P(x) = (x — c) Q(x) with deg Q(x) = deg P(x) — 1. 


Proof Suppose P(c) = 0. Then from the division algorithm 
P(x) =( —c)Q(X) +r), 
where r(x) = Oorr(x) = f € F, since deg r(x) < deg (x — c) = 1. Therefore 
Px)=@—cC)OQ) +f. 
Substituting, we have P(c) = 0+ f =0, and f = 0. Hence 


P(x) = — c) Q(x). 


Corollary 6.2.1 An irreducible polynomial of degree greater than one over a field 
F has no roots in F. 
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From this we obtain the following result which bounds the number of roots of a 
polynomial over a field. 


Lemma 6.2.6 A polynomial of degree n in F[x] can have at most n distinct roots. 


Proof Suppose P(x) has degree n and suppose c),..., C, aren distinct roots. From 
repeated application of Lemma 6.2.4, 


P(x) = k(x —c))---@— eq), 
where k € F’. Suppose c is any other root. Then 
P(c) =O0=k(c —c1)-++(€ — en). 


Since a field F has no zero divisors, one of these terms must be zero: c — c; = O for 
some i, and hence c = ¢;. 


Besides having a maximum of 7 roots (with n the degree) the roots of a polynomial 
are unique. Suppose P(x) has degree n and distinct roots cy, .., c, with k < n. Then 
from the unique factorization in F'[x] we have 


P(x) = = c++ (& — eK)" O1 (x) +++ Or), 


where Q;(x),i = 1,..., ¢ areirreducible and of degree greater than 1. The exponents 
m; are called the multiplicities of the roots c;. Let c be a root. Then as above, 


(c — cy)" +++ (Ce — cx)” O1(c) +++ O(c) = 0. 


Now Q;(c) 0 fori = 1, .., t since Q;(x) are irreducible of degree > 1. Therefore, 
(c — c;) = 0 for some i, and hence c = ¢;. 

Finally the famous Fundamental Theorem of Algebra (see [FR 2]) says that any 
nonconstant complex polynomial must have a root. As a consequence of this and the 
divisibility property it follows that a complex polynomial of degree n must have n 
roots counting multiplicities. 


Theorem 6.2.3 (Fundamental Theorem of Algebra) If p(x) is a nonconstant com- 
plex polynomial, p(x) € C[x], then p(x) has a complex root. 


6.2.1 Euclidean Domains and the Gaussian Integers 


In analyzing the proof of unique factorization in both Z and F'[x] it is clear that it 
depends primarily on the division algorithm. In Z the division algorithm depended on 
the fact that the positive integers could be ordered and in F'[x] on the fact the degrees 
of nonzero polynomials are nonnegative integers and hence could be ordered. This 
basic idea can be generalized in the following way. 
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Definition 6.2.4 Let R be an integral domain. Then R is a Euclidean domain if 
there exists a function N from R* = R\{0} to the nonnegative integers such that 

I. N(r1) < N(rir2) for any rj, r2 € R*. 

2. Forallr,,r2 € R with rz 4 0 there exists q,r € R such that 


rn=qn+r, 


where eitherr =QOor N(r) < N(1}). 


The function N is called a Euclidean norm on R. 


Therefore Euclidean domains are precisely those integral domains which allow 
division algorithms. In the integers Z define N(z) = |z|. Then N is a Euclidean norm 
on Z and hence Z is a Euclidean domain. On F'[x] define N(p(x)) = deg(p(«)) if 
p(x) € 0. Then N is also a Euclidean norm on F'[x] so that F' [x] is also a Euclidean 
domain. In any Euclidean domain we can mimic the proofs of unique factorization 
in both Z and F'[x] to obtain the following: 


Theorem 6.2.4 Every Euclidean domain is a unique factorization domain. 


Before proving this theorem we must develop some results on the number theory 
of general Euclidean domains. First some properties of the norm. 


Lemma 6.2.7 Jf R is a Euclidean domain then 
I. N() is minimal among {N(r); r € R*}. 
2. N(u) = N(1) ifand only if u is a unit. 
3. N(a) = N(b) fora, b € R* ifa, b are associates. 
4. N(a) < N(ab) unless b is a unit. 
Proof (1) From property (1) of Euclidean norms we have 
N(1) < NCU. -r) = N(r) for any r € R*. 
(2) Suppose u is a unit. Then there exists u~! with uw - u~! = 1. Then 


N(u) < N(u-u7!) = N(A) 


From the minimality of N(1) it follows that N(u) = N(1). 
Conversely suppose N(u) = N(1). Apply the division algorithm to get 


l=qu+r. 
Ifr A OthenN(r) < N(u) = N(1) contradicting the minimality of N (1). Therefore 


r =O and | = qu. Then uv has a multiplicative inverse and hence is a unit. 
(3) Suppose a, b € R* are associates. Then a = ub with uv a unit. Then 
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N(b) < N(ub) = N(a). 
On the other hand b = u~'a so 
N(a) < N(u"'a) = N(b). 


Since N(a) < N(b) and N(b) < N(a) it follows that N(a) = N(b). 
(4) Suppose N(a) = N(ab). Apply the division algorithm 


a=q(ab) +r, 
where r = Oor N(r) < N(ab). Ifr 4 0 then 
r=a-—gqab=a(1—qb) = N(ab) = N(a) < N(ad — gb)) = Nr) 


contradicting that N(r) < N(ab). Hence r = 0 anda = q(ab) = (qb)a. Then 


a=(qba=1-a = qb=1 


since there are no zero divisors in an integral domain. Hence b is a unit. Since 
N(a) < N(ab) it follows that if b is not a unit we must have N(a) < N(ab). 


We next need the concept of a gcd. 


Definition 6.2.5 Let R be a Euclidean domain and let r,,rz € R. If rz # 0 then 
d éR is a GCD for r), rz if d|r, and d|rz and if d\|r; and d\|rz then d\\d. If 
r) = 12 = 0 thend = Ois the gcd of 1, ra. 


In Z GCD’s are unique if we choose d to be positive. In general they are only 
unique up to associates. 


Lemma 6.2.8 Any two gcds of r1, 72 € R are associates. Further an associate of a 
gcd of r\, rz is also a ged. 


The proof is straightforward and we leave it to the exercises. 


Lemma 6.2.9 Suppose R is a Euclidean domain and r,, r2 € R withrz 4 0. Then a 
gcd d for r,, rz exists and is expressible as a linear combination with minimal norm. 
That is there exists x, y € R with 


d=rx+ny 
and N(d) < N(d,) for any other linear combination d, = r;u + rzv of 11, 12. 


Further if r; 40,12 4 0 then a gcd can be found by the Euclidean algorithm 
exactly as in Z and F [x]. 
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The proof of this lemma, except for uniqueness which from Lemma 6.2.8 is only 
true up to associates, is identical to the proof in Z and we leave it to the exercises 
(see Chapter 2). 

Unique factorization will follow from the analog of Euclid’s lemma. 


Lemma 6.2.10 (Euclid’s Lemma) Suppose R is a Euclidean domain andr € R is a 
prime. If r|rjr2 then r|r, or r|ro. 


Proof Suppose r|r;rz. If r does not divide r; then the gcd of r and r; must be a unit 

u since the only factors of r are units and associates of r. Then from Lemma 6.2.8, | 

is also a gcd since | is an associate of any unit. Therefore there exists x, y € R with 
l=rixt+ry. 

Multiplying through by rz we obtain 


ry = (rir2)x +roary. 


Since r|rjr2 and r|r it follows that r|ro. 


We can now prove Theorem 6.2.4. Suppose that R is a Euclidean domain. We must 
show that R is a UFD. First letr € R withr 4 0. To show that r either is a unit or has 
a prime factorization we use induction on the norm. If N(r) is minimal then N(r) = 
N(1) andr is a unit. Suppose that V(r) is the minimal norm greater than N(1). We 
claim that r must be a prime. Ifr = r)r2 and neither r; nor rz were units from Lemma 
6.2.7 then both N(r,;) < N(r), N(v2) < N(r) contradicting the minimality of N(r) 
among nonunits. Therefore r is a prime and the beginning of the induction is correct. 
Assume that if N(r) <k then r has a prime factorization and suppose then that 
N(r) =k. If r is prime then it certainly has a prime factorization. If r is not prime 
then r = r,r2 with both r;, r2 nonunits. Then N(r,;) < N(r) and N(r2) < N(r) and 
from the inductive hypothesis both r; and r2 have prime factorizations and hence so 
does r. 

The uniqueness of the factorization, at least up to units and ordering follows 
almost identically to what was done in Z. Notice that ifr, s are both primes in R and 
r|s then r, s are associates. Then, as in Z, assume that r has two prime factorizations 


PS" TR =Si° + SY 
withr),..., 7%, 51,---, 8; all primes in R. We now apply Euclid’s Lemma repeatedly 
to get that each 7; is an associate of some s; and k = t. We leave the details to the 
exercises. 

We now apply these ideas to the Gaussian integers 


Zli] = {a + bi; a,b € Z}. 


It was first observed by Gauss that this set permits unique factorization. To show this 
we need a Euclidean norm on Z{i]. 
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Definition 6.2.6 [fz = a+ bi € Z[i] then its norm N(z) is defined by 
N(at+bi)=a°4+0* 
The basic properties of this norm follow directly from the definition (see exer- 
cises). 


Lemma 6.2.11 /fa, 3 € Z[i] then: 


. N(aq) is an integer for all a € Z{i]. 

. N(a) = 0 for alla € Zi). 

. N(a) = 0 ifand only if a = 0. 

. N(a) > | foralla £ 0. 

. N(aB) = N(a)N(Q{) that is the norm is multiplicative. 


MAR WHOS 


From the multiplicativity of the norm we have the following concerning primes 
and units in Z[i]. 


Lemma 6.2.12 (1) u € Zi] is a unit if and only if N(u) = 1. 
(2) Ifa € Z[i] and N(x) = p, where p is an ordinary prime in Z then 7 is a prime 
in Zi]. 


Proof Certainly u is aunitif and only if N(u) = N(1). Butin Z[i] we have N(1) = 1 
so the first part follows. 

Suppose next that 7 € Z[i] with N(7) = p for some p € Z. Suppose that 7 = 
77. From the multiplicativity of the norm we have 


N(m) = p = N(m)N(772). 
Since each norm is a positive ordinary integer and p is a prime it follows that either 


N(m1) = 1 or N(m) = 1. Hence either 7, or 72 is a unit. Therefore 7 is a prime in 
Z{i}. 


Armed with this norm we can show that Z[i] is a Euclidean domain. 
Theorem 6.2.5 The Gaussian integers Z[i] form a Euclidean domain. 


Proof That Z[i] forms a commutative ring with an identity can be verified directly 
and easily. If a3 = 0 then N(a)N((@) = 0 and since there are no zero divisors in Z 
we must have V(a) = 0 or N(@) = 0. But then either ~ = 0 or @ = 0 and hence 
Z{i] is an integral domain. To complete the proof we show that the norm N is a 
Euclidean norm. 

From the multiplicativity of the norm we have if a, 3 4 0, 


N (a3) = N(a)N(G) = N(q) since N(G) > 1. 


Therefore property (1) of Euclidean norms is satisfied. We must now show that the 
division algorithm holds. 
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Let a=a-+bi and 6 =c+di be Gaussian integers. Recall that for a nonzero 
complex number z = x + iy its inverse is 


1 Zz x—Iy 


z (zl? x2 + yy?" 


Therefore as a complex number 


a B ( apgl 

ies By I So jo 

BBR Te 
ac+bd bc—-ad. 


= oad -f oa ge 


Now since a, b, c, d are integers u, v must be rationals. The set 
{u +iv;u,v € Q} 


is called the Gaussian rationals. 

Ifu, v € Zthenu +iv € Zi], a = qG with g = u + iv and we are done. Other- 
wise choose ordinary integers m,n satisfying |u — m| < 5 and |v —n| < 5 and let 
q=m-+in.Theng € Z[i]. Letr = a — qf3. We must show that N(r) < N(). 


Working with complex absolute value we get 


Ir| = la —46| =|6ll = —4l- 


B 
Now 
a . 1 1 
l= —g| =|u—m) +i(v—n)| = Vu—m)?+ (v—n)? < JCP + (5? <i. 
p 2 2 
Therefore 


Ir} < |@] => Ir? <|6r = NO) < NB 


completing the proof. 


Since Z[i] forms a Euclidean domain it follows from our previous results that 
Z{i] must be a UFD. 


Corollary 6.2.2. The Gaussian integers are a UFD. 


Since we will now be dealing with many kinds of integers we will refer to the 
ordinary integers Z as the rational integers and the ordinary primes p as the rational 
primes. It is clear that Z can be embedded into Z[i]. However not every rational prime 
is also prime in Z[i]. The primes in Z[i] are called the Gaussian primes. For example 
we can show that both 1 +7 and | — i are Gaussian primes, that is primes in Zi]. 
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However (1 + 1)(1 — 7) = 2 so that the rational prime 2 is not a prime in Z[i]. Using 
the multiplicativity of the Euclidean norm in Z[i] we can describe all the units and 
primes in Z[/]. 


Theorem 6.2.6 Consider the Gaussian integers Z[i]. 


1. The only units in Zi] are +1, +i. 
2. Suppose 7 is a Gaussian prime. Then 7 is either: 


(a) a positive rational prime p = 3 mod 4 or an associate of such a rational 
prime. 

(b) 1 +i oranassociate of 1 +i. 

(c) a+ bi ora — bi, where a > 0,b > 0, a is even and N(r) =a? +b? = Dp 
with p a rational prime congruent to 1 mod 4 or an associate of a + bi or 
a — bi. 


Proof (1) Suppose u = x + iy € Z[i] is a unit. Then from Lemma 6.2.12 we have 
NW) =x? + a = | implying that (x, y) = (0, £1) or (x, y) = (41, 0). Henceu = 
+1 oru = +i. 

(2) Now suppose that 7 is a Gaussian prime. Since N(7) = a7 and 7 € Z[i] it 
follows that z|N (7). N(z) is a rational integer so N(7) = pi--- pe, where the p;’s 
are rational primes. By Euclid’s lemma 7| p; for some p; and hence a Gaussian prime 
must divide at least one rational prime. On the other hand suppose | p and 7|g, where 
p,q are different primes. Then (p, qg) = 1 and hence there exist x, y € Z such that 
1 = px + qy. It follows that 7|1 a contradiction. Therefore a Gaussian prime divides 
one and only one rational prime. 

Let p be the rational prime that 7 divides. Then N(7)|N(p) = p. Since N (7) 
is a rational integer it follows that N(z) = p or N(m) = p. If 7 =a+bi then 
a? + b* = pora’* +b? = p’. 

If p = 2 then a? + b? = 2 or a? +b? = 44. It follows that 7 = +2, +2i or 7 = 
1 +i or an associate of 1 +7. Since (1 +7)(1 —i) = 2 and neither 1 +i nor 1 —i 
are units it follows that neither 2 nor any of its associates are primes. Then 7 = 
1 +7 or an associate of 1 + i. To see that 1 + i is prime suppose 1 + i = a3. Then 
NU +1) =2 = N(a)N((). It follows that either N(@) = | or N(@) = 1 and either 
qa or @ is a unit. 

If p #2 then either p =3 mod 4 or p = 1 mod 4. Suppose first that p = 3 
mod 4. Then a? + b* = p would imply from Fermat’s two-square theorem (see 
Chapter 2) that p = 1 mod 4. Therefore from the remarks above a? + b* = p* and 
N(x) = N(p). Since 7| p we have 7 = ap with a € Zi]. From N(7) = N(p) we 
get that N(a) = | and a is a unit. Therefore 7 and p are associates. Hence in this 
case 7 is an associate of a rational prime congruent to 3 mod 4. 

Finally suppose p = | mod 4. From the remarks above either N(7) = p or 
N(r) = p’. If N(m) = p* then a? + b? = p’. Since p = 1 mod 4 from Fermat’s 
two-square theorem there exist m,n € Z with m? +n? = p. Let u = m+ in then 
the norm N(u) = p. Since p is arational prime it follows from Lemma 6.2.12 that u 
is a Gaussian prime. Similarly its conjugate w is also a Gaussian prime. Now wal p* 
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and Pp = N(r). Since z|N(z) it follows that t|uvu and from Euclid’s Lemma either 
m|u or 7|u. If z|u they are associates since both are primes. But this is a contradic- 
tion since N(z) € N(u). The same is true if |. It follows that if p = 1 mod 4 that 
N(r) 4 p. Therefore in this case N(z) = p = a’ + b*. Anassociate of 7 has both 
a, b > 0 (see exercises). Further since a* + b* = p one of a or b must be even. If 
a is odd then b is even and then i7 is an associate of 7 with a even, completing the 
proof. 


In the proof above we used Fermat’s two-square theorem. Gauss’s original motiva- 
tion in investigating the complex integers was to prove results in elementary number 
theory. As an application of unique factorization in Z[i] we give another proof of the 
Fermat two-square theorem in the following form. 


Theorem 6.2.7 Let p be an odd rational prime. Then p = a? + b* for a,b € Z if 
and only if p = 1 mod 4. 


Proof Suppose first that p = a? + b*. Since p is odd one of a, b is even and the 
other is odd. Suppose a = 2n and b = 2m + | then 


p=a+b* = (2n)* + (2m +1)? = 4n? +4m? +. 4m 41 = 40? +m +m) +1 


and therefore p = | mod 4. 

Conversely suppose that p = | mod 4. From Chapter 3 we then have that —1 
is a quadratic residue mod p that is there exists an integer x such that x7 + 1 =0 
mod p. Then p| (x? +1) so p|(x +i)(« —i). If p were prime, (we cannot use the 
characterization of primes in Z[i] since we used the two-square theorem in that 
proof), then p|(x + i) or p|(x — i). If p|(x +7) then x +7 = p(a+ bi) for some 
integers a, b. This would imply that pb = | which is impossible. Hence p cannot 
divide x + i. An identical argument shows that p cannot divide x — i. Therefore p 
cannot be a Gaussian prime. 

Since p is not a Gaussian prime we have a factorization p = (a+ bi)(c+ di), 
where neither factor is a unit. Then 


NO) =P =@ +h YC +d”). 
Since p is prime this implies that a* + b? = p or a* +b? = p?. If a2 +b? = p’* 


then c? + d? = 1 and c + di is a unit contradicting that it is not a unit. Therefore 
a? + b* = p and we are done. 


Finally we show that the methods used in Z[i] cannot be applied to all quadratic 
integers. Kummer, as mentioned in Section 6.1, considered rings of the form 


ZL./—p] = {a + ib,/p; a, b € Z, pa prime}. 


One can then define the norm as N(a + ib,/p) = a? + pb?. This norm is multi- 
plicative N(a@B) = N(a)N(G). However not all of these rings are UFD’s. We show 
for example that there is not unique factorization in Z[./—5]. 
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By using the multiplicativity of the norm in Z[./ —5] it can be shown that 3, 7, 1 + 
21/5, 1 — 2iV75 are all primes and not associates (see the exercises). However 


21=3-7= (14 2iV5)(1 — 2iV5). 


Therefore factorization into primes in Z[./—5] is not unique and hence this set is not 
a UFD. We will examine these rings of quadratic integers more closely in Section 
6.4 and consider the question of exactly which ones are UFD’s. 


6.2.2 Principal Ideal Domains 


We now take a slightly different approach to UFD’s which will eventually lead us to 
Dedekind’s theory of ideals. 


Definition 6.2.7 An ideal J in an integral domain R is a subring with the property 
that RI CI, thatisri € I forallr € Randi € I. An ideal is thus a subring closed 
under multiplication from the whole ring. 


In the rational integers Z the set nZ consisting of all multiples of is an ideal. 
We will see shortly that every ideal in Z has this form. 


Theorem 6.2.8 Let R be an integral domain and a, ..., Qn fixed elements of R. 
Let I = {rjay +++: +1,Qn3 17; € R}. Then I forms an ideal in R called the ideal 
generated by {a1,..., @,}. We will denote this by 


<Q],..-,Q,>. 


Ifn = 1 so that 1 =< a> witha eé R then I consists of all R-multiples of a. An 
ideal of this form < a > is called a principal ideal. 


Proof The proof is straightforward. If J = {rjay +--+ +7,0n37; € R} and i; = 
TyQy +--+ +7yQy, iz = 8;Q, + +--+ S5_2Qy are two elements of J then 


big = (1 ES) + ++ + (ME Sp) ET 
and hence / is closed under addition and additive inverses. If r € R then 
rly = (rryay +--+ + rp )an € I 


so that J is closed under multiplication from R. Therefore RJ C J and in particular 
I-I CI soT is closed under multiplication. Therefore J is an ideal. 


Notice that nZ = < n > is a principal ideal. In the rational integers Z we have 
the following. 
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Theorem 6.2.9 Every ideal in Z has the form nZ for some n € Z. In particular 
every ideal in Z is a principal ideal. 


Proof Let I be an ideal in Z. If J = {0} then J = OZ. If I 4 {0} then there exists 
z € I with z 4 0. Since / is a subring —z is also in J. Since either z or —z is positive 
it follows that J must contain positive elements. Let n be the least positive element 
of J. We show that J = nZ. 

Let a be a positive element of 7. Then by the division algorithm 


a=nq+r, 


where r = Oor0 <r <n. Ifr A0then0 <r=a—ng <n.Nowael,nel 
and hence nq and a — ng € I since J is an ideal. This contradicts the minimality of 
n as the least positive element of 7. Therefore r = 0 and a = nq. If a is a negative 
element of J then —a > 0 and —a = nq. Thena = n(—q). Hence every element of 
I is a multiple of n and therefore J = nZ. 


Definition 6.2.8 A principal ideal domain, abbreviated as PID, is an integral 
domain where every ideal is a principal ideal. 


In this language, Theorem 6.2.9 says that the rational integers Z are a PID. The 
same proof using degrees of polynomials would show that the polynomial ring F'[x] 
over a field F is also a PID. This is no accident since both are Euclidean domains 
and the following is true. 


Theorem 6.2.10 Any Euclidean domain R is a PID. 


The proof is entirely analogous to the proof of Theorem 6.2.3 using the Euclidean 
norm. We leave the details to the exercises. Euclidean domains are PID’s and UFD’s. 
This will follow also from the next result although we proved unique factorization 
in Euclidean domains directly. 


Theorem 6.2.11 Every PID R is a UFD. 


We use a series of lemmas to obtain a proof of the above result. As for Euclidean 
domains, uniqueness of prime factorization depends on an analog of Euclid’s Lemma. 
The existence of a prime factorization depends on a property in PID’s called the 
ascending chain condition. 


Lemma 6.2.13 Let R be an integral domainand I, C Ih C --+ , anascending chain 
of ideals of R. Then I = U;J; is also an ideal. 


Proof Let r,,r2 € I. Then since {J;} is an ascending chain there exists an J,, with 
both r1,r2 € J,. Then r} 4 rz and rr, with r € R are all in J, since J, is an ideal. 
But J, C J so all are in J and hence / is an ideal. 


We next show that in a PID every strictly increasing sequence of ideals must 
terminate. We call this the ascending chain condition or ACC on ideals. 
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Definition 6.2.9 An integral domain R satisfies the ascending chain condition or 
ACC on ideals if for every ascending chain of ideals I, C In C --+ there exists a 
positive integern such that I; = I, foralli > n. Equivalently every strictly increasing 
ascending chain, that is all inclusions proper, must have finite length. 


Lemma 6.2.14 Every PID satisfies the ACC. 


Proof Letl, C Ih C --- beanascending chain of ideals in the PID R. Then J = U;/; 
is an ideal in R. Since R is a PID we have J] = <r > forsomer € R.Nowr € 1 
sor € I, for some [,. Then for alli > n 


<r>cCi,cj,Ccl=<r>. 


It follows that J; = J, for alli > n and R satisfies the ACC. 


Finally we need the analog of Euclid’s Lemma. 


Lemma 6.2.15 (Euclid’s Lemma for PID’s) Suppose R is a PID and p € R is a 
prime. If p\ab then p|a or p\b. 


Proof Notice first the following relationships between divisibility and principal 
ideals in a PID. 

(i) a|b if and only if <b>C<a> 

(ii) <b > =<c > if and only if b and c are associates. 

(iii) <a > = Rif and only if a is a unit. 

The proofs of these properties follow directly from the definitions (see exercises). 

Now suppose that p is a prime in R and p|ab. Suppose p does not divide a. 
Then <a > is not contained in < p >. It follows that J = <a, p > the ideal 
generated by a and p is not equal to < p >. Since R is a PID we have an element 
cé€R with <a,p >=<c>. Therefore < p> C<c>so p=cr. Since p is 
a prime either c or r is a unit. If c is not a unit then p and c are associates and 
< p>z=<c>andhence <a, p > =< p > acontradiction. Therefore c is a unit 
and <c > =<a, p > = R the whole integral domain. In the next subsection we 
will see that what we have actually proved is that if p is a prime ina PID then < p > 
is a maximal ideal. Then since < a, p > = R we must have | € < a, p >, where | 
is the multiplicative identity. 


le<a,p>= > ar+ps=1forsomer,s € R. 
As in the proof for rational integers multiply through by b to obtain 


abr + pbs = b. 


Since p|ab and p|p it follows then that p|b. 


We can now prove Theorem 6.2.10. 
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Proof (of Theorem 6.2.10). We show first that each non-unit in R can be expressed 
as a product of primes. Let r € R with r 4 0 and r a nonunit. We show that there 
is a prime p € R which divides it. If r is a prime we are done. If not r = r,s with 
neither r; nor s a unit. It follows that 


A aa ech ae 


If r; is prime then r is an associate of 7; and we are done. If not continue in this 
manner to obtain an ascending chain of ideals 


<r>C<r>C<m>:::. 


By the ACC this chain must terminate at some <r, > and hence r, must be a 
prime, Hence r must be divisible by at least one prime p;. Therefore r = p,s). By 
the same argument there is a prime p2|s; so that r = p,p252. We cannot get an 
infinite factorization by the ACC so it follows that there must be a finite factorization 
r = p\--- px with p; all primes. Therefore there must be a prime factorization. 
The uniqueness of this factorization up to ordering and units follows analogously 
to all the previous cases from Euclid’s Lemma. If r = p,--- py = qi ---q, with 
Pi, q; all primes in R then p;|q; for some j andk = tf. Since both are primes p, and 
qj are associates. It now goes through as before. 


Hence every PID is a UFD. Are there UFD’s which are not PID’s? The answer is 
yes. To give an example we state the following theorem. This is not directly relevant 
to our subsequent work on algebraic numbers so we omit the proof (and sketch an 
outline of it in the exercises). 


Theorem 6.2.12 If R is a UFD then the polynomial ring R(x] is also a UFD. 
From this result we have 
Corollary 6.2.3 Z[x] is a UFD. 


Corollary 6.2.4 If F is a field then F[x1,...,Xn], the ring of polynomials in n 
variables over F, is a UFD. 


From this second corollary we get the example. Fx, y] is a UFD for any field F’. 
Let J be the set of polynomials in Fx, y] with constant term 0. This forms an ideal 
but it is not principal (see exercises). 


6.2.3 Prime and Maximal Ideals 


Certain ideas arose in the proof of Theorem 6.2.11 which we look at a bit more 
closely. 
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Definition 6.2.10 An ideal I in an integral domain R is a prime ideal if whenever 
ryr2 € I then either r, € I orrg € I. I is a maximal ideal if whenever I Cc I, with 
I, an ideal then either I, = I or I, = R. 


Hence a maximal ideal is an ideal which is contained in no larger ideal other than 
the whole integral domain. This is equivalent to < J,r > = Rifr ¢ I. In the proof 
of Euclid’s Lemma for PID’s we actually showed that if p is a prime then < p > is 
a maximal ideal. The general relationship between primes and the principal ideals 
they generate in PID’s is given in the next theorem. 


Theorem 6.2.13 Let R be a PID and let r € R with r 4 0. Then the following are 
equivalent: 


I. r € Ris prime. 
2. <r > isaprime ideal. 
3. <r > isamaximal ideal. 


In particular in a PID a nonzero ideal is maximal if and only if it is prime. 


Proof We show first that (1) is equivalent to (2). Suppose r is a prime and rjrz € < 
r >. Then r|r;rz2 so by Euclid’s Lemma r|r; or r|rz. If r|r; then; € < r > while if 
r|r2 then rz € <r >. It follows that <r > is a prime ideal. 

Conversely suppose that < r > isa prime ideal andr = rjrp. Since ryr2 E< r > 
we have eitherr) €< r >orm €<r>.Ifr; e<r > thenr; =r3r and then 


r=rnr=(nr3)r => mr=1. 


Hence r, is a unit. Similarly ifr. € < r > then; is a unit. It follows that r is prime. 
The proof about maximality is essentially the proof of Euclid’s Lemma. 
We now show that (1) is equivalent to (3). Suppose r is a prime and <r >C TI. 
If <r > / then there exists anr; € J withr; <r >.Hence <r,r) >A<r>. 
Since RisaPID <r,ry >=<7r2 >sor € < ry >. Thenrg|r and hence ry is either 
a unit or an associate of r. If ro is a unit then <r > = R and hence / = R. If 
<r) > is nota unit then r> is an associate of r and hence 


<rn>=<m>=<r> 


a contradiction sincer; ¢ < r >. Hence rz is aunit, 7] = Rand <r > is amaximal 
ideal. 

Conversely suppose that < r > is maximal and rjr2 = r. Suppose first that r|r,. 
Since r; |r then r andr; are associates. Now if r does not divider; thenr; ¢ < r > so 
that < r,r) >A<r >. It follows from the maximality of < r > that<r,r; >= R. 
Hence | € <r,r; > and there exists x, y € R with 


rxtrjy=l. 
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Multiplying through by rz we have 

1r2X +17\roy =12. 
Then r|r2. Therefore r2 = r3r and we have r = (r r3)r. Hence rjr3 = | and r; is 


a unit. Hence either r; is an associate of r or a unit. In either case rz is either an 
associate of r or a unit. Therefore r is prime. 


In an integral domain R we can use ideals to build factor rings. This is a fun- 
damental concept in abstract algebra and will also play a role in algebraic number 
theory. We define this in general. 


Definition 6.2.11 Jf R is an integral domain and I is an ideal in R then a coset of 
I is a subset of the form 
r+J={r+i;ie J}. 


The set of cosets of I in R is denoted R/T. 


Lemma 6.2.16 (/) The set of cosets R/I partition R andr € I ifandonly ifr +I = 
O+T. 


Proof On R define r; ~ rz ifr; — r2 € I. This is an equivalence relation (see exer- 
cises) and therefore the equivalence classes partition R.Ifr € R its equivalence class 
[r] is precisely the coset r + J. 


Next we define operations on R/T. If [r1)] = 71 + J and [ro] = ro + J then 
In] + lr]= (itr) +2 = [rn +r] 
[ra|lr2] = (rire) + 2 = [rir]. 


Lemma 6.2.17 The operations defined on R/I are well-defined. 


Proof Well-defined means that if [7;] = [r2] and [73] = [ra] then[7)] + [73] = [r2] + 
[ra] and [r;][73] = [r2][74]. We show this is true for addition and leave multiplication 
to the exercises. 

Suppose [7;] = [72] thenr, ~ rp = > 7, — 12 € I. Similarly if [73] = [74] then 
r3 —r4 € T.Then (7) — r2) + (73 — ra) € J whichimplies (7; +73) — (72 +174) € I. 
Therefore [r; + r3] = [r2 + r4] and addition is well-defined. 


Ifr,; + J =r.+/] we will also write r; = r. mod I. 


Theorem 6.2.14 Let R be an integral domain and I C R an ideal. Then 


I. R/I forms a commutative ring with an identity under the operations defined 
above. 

2. R/I is an integral domain if and only if I is a prime ideal. 

3. R/T isa field if and only if I is a maximal ideal. 
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The ring R/I is called the factor ring or quotient ring of R modulo I. 


Proof The proof that R/J is acommutative ring with an identity is a routine exercise. 
We show (2) and (3). We need that the elements of R/J are the cosets which we will 
now denote as [r] and that the additive identity is [0] which we will just write as 0 
in R/TI. Further the multiplicative identity of R/J is [1] which we will write as 1 
in R/T. 

Suppose / is a prime ideal and suppose [r; ][r2] = [0] = 0in R/T. Then ryr2 € J 
and then either 7; € J or rz € J. If 7; € J then [7;}] = 0 in R/J and if ro € J then 
[72] = Oin R/T. Therefore there are no zero divisors in R/J and hence its an integral 
domain. 

Conversely suppose R/J is an integral domain and suppose rir € 7. Then 
[ri ][r2] = 0 and since R/T is an integral domain either [7;] = 0 or [72] = 0. In 
the former case r; € J and in the latter r2 € 7. Therefore J is a prime ideal. 

Next suppose that J is maximal. If[r] 4 Oin R/J thenr ¢ J. From the maximality 
of I it follows that < J,r > = Rand then | € < J,r >. This implies that there exist 
x,y € Rwith 

rx +iy = 1 forsomei € I. 


But then in R/T we have [r][x] = [1] = 1 since [iy] = [0] = 0. Hence in the factor 
ring [7] is a unit. Since [r] was an arbitrary nonzero element of R/J it follows that 
R/T isa field. 

Conversely suppose R/J is a field. Ifr ¢ J then [r] A 0 in R/T and hence there 
exists an inverse [x] with [r][x] = 1. Hence there is ani € J, y € R with 


rx+iy=1. 


It follows that 1 € < J,r > which implies that < J,r > = R. Therefore J is maxi- 
mal. 


Now a field F is always an integral domain. Therefore if R/T is a field it follows 
that R/TJ is an integral domain. Translating this into statements about the ideal J we 
have: 


Corollary 6.2.5 In any integral domain a maximal ideal is a prime ideal. 


Note that the converse of this corollary is not necessarily true in general but it is 
true in a PID for nonzero prime ideals. 

Finally we sketch a beautiful application of these ideas called Kronecker’s The- 
orem. Although it was proved by Kronecker well after the work of Galois, from a 
modern perspective it is really the starting off point for Galois Theory. We will look 
more carefully at this in the next section. 


Theorem 6.2.15 Let F be afield and p(x) € F(x] an irreducible polynomial. Then 
there exists a field F' with F C F' in which p(x) has a root. 
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Proof Since p(x) is irreducible and F'[x] is a PID the ideal < p(x) > is a maximal 
ideal. Then the factor ring 
F' = F[x]/ < p(x) > 


is a field. The elements of F’ are cosets g(x) + < p(x) >. If we identify f € F with 
the coset f + < p(x) > = [f] this gives an embedding of F into F’. Therefore F 
can be considered as a subfield of F’. 

Now consider [x] = x + < p(x) >. Then by considering the operations in F’ 
it is clear that p([x]) = [p(x)] (see exercises). But [p(x)] = p(x) + < p(x) > 
=< p(x) > =[0]. Therefore in F’ we have p([x]) =0 and [x] is a root of 
p(x) in FP’. 


We will give a well-known example to clarify the theorem. Let F = Rand p(x) = 
x? + 1. Then p(x) is irreducible in R[x]. Let R’ = R[x]/ < x? +1 >.Sincex* +1 
is prime the ideal < x? + 1 > is a maximal ideal and hence R’ is a field. 

Each element of R’ is a polynomial in R[x] modulo < x? + 1 >. By the division 
algorithm if h(x) € R[x] with h(x) 4 0 then 


h(x) = q(x)(x? + 1) + Ay (x) with deg(h1(x)) < deg(x? + 1) = 2. 
Therefore h,(x) = a+ bx with a, b € R. However 
h(x) = h(x) mod <x?+1>. 
It follows that every element of IR’ can be expressed as a + bx witha, b € R. There- 
fore 
R’ = {a+bx;a,b eR}. 
Further in R’ we have x? + 1 = 0 and hence x* = —1. Then 


R’ = {a+ bx;a,beER,x* =—1). 


Mapping R’ onto C the complex numbers by 1 — 1, x — i gives an isomorphism. 
Therefore R’ is precisely C the complex numbers. 


6.3 Algebraic Number Fields 


An algebraic number field is a finite field extension of the rational numbers Q within 
the complex numbers C. As before we must first look at some essential definitions 
from abstract algebra. 

If F and F’ are fields with F a subfield of F’, then F’ is an extension field, or 
simply an extension, of F. If we have a chain of fields and extension fields 


FCECE CF 
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then F is called the ground field and E and E’ are intermediate fields. 
Recall that if F is a field then a vector space V over F consists of an abelian 
group V together with scalar multiplication from F satisfying: 


. fveViffeF,vevV. 

. fut+ov)= fut fvfor fe Fu,ve V. 
.(f+g)v= fut+gvforfige Fue V. 
. (fg)v = f(gv) for fige F,ve V. 
lu=vforve V. 


nNBwWN Re 


A set of elements in a vector space V, {v),..., U,} is independent, over F if 
whenever fjv; +---+ f,U, = 0 then each scalar f; = 0. If a set is not independent 
then it is called dependent. For a subset U C V the set 


{fiurt--:+ frtasn > 1,u; €U, fi € F} 
of linear combinations of elements of U forms a subspace of V called the span of 
U or the subspace spanned by U. This is denoted by < U >. If U = {v1,..., u,} is 
finite then we write < U > = < v),..., U, >. An independent set which spans the 
whole vector space V is called a basis for V. The number of elements in a basis is 
unique and is called the dimension of V over F denoted dim, V or just dimV if F 


is understood. If there is a finite basis then V is finite-dimensional over F. 
If vj, ..., UV, 18 a basis for V and wj,..., w, is another set of vectors in V then 


wr = fivi tes + fin’n 
w= fav Se FoanUn 
Wn = Tniv1 fore t Fan¥n 


for some scalars f;; € F. Then w}, .., w, 1s also a basis if and only if the transition 


matrix 
Fit ++ fin 
fai te ton 
Sut cee Fan 


has nonzero determinant. 

If F’ is an extension field of F then multiplication of elements of F’ by ele- 
ments of F are still in F’. Since F’ is an abelian group under addition, F’ can 
be considered as a vector space over F. Thus any extension field is a vector 
space over any of its subfields. The degree of the extension is the dimension 
of F’ as a vector space over F’. We denote the degree by |F’: F|. If the degree 
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is finite, that is, |F’: F| < co, so that F’ is a finite-dimensional vector space 
over F,, then F’ is called a finite extension of F. 

From vector space theory we easily obtain that the degrees are multiplicative. 
Specifically: 


Lemma 6.3.1 Jf F C F’ C F" are fields with F" a finite extension of F, then |F' : 
F| and |F" : F'| are also finite, and |F" : F\| =|F": F'||F’: Fl. 


Proof The fact that |F’: F| and |F” : F’| are also finite follows easily from linear 
algebra since the dimension of a subspace must be less than the dimension of the 
whole vector space. 

If |F’: F| =n with aj,..., a, a basis for F’ over F, and |F”: F’| =m with 
Bi,..-, Bm a basis for F” over F’ then the mn products {a;3;} form a basis for F” 
over F (see the exercises). Then 


|F": Fl} =mn=|F": F'||F': Fl. 


This last argument also shows that if F C F’ C F” are fields, with |F’ : F| and 
|F": F’| finite, then F’” is a finite extension of F. 


EXAMPLE 6.3.1 C is a finite extension of R, but R is an infinite extension of Q. 

The complex numbers 1, i form a basis for C over R. It follows that the degree 
of C over R is 2, that is, |C : R| = 2. 

That R is infinite dimensional over Q depends on the existence of transcendental 
numbers. An element r € R is algebraic (over Q) if it satisfies some nonzero poly- 
nomial with coefficients from Q. That is, P(r) = 0, where 


OF P(x) =ap + ax +--+ +a,x" with a; € Q. 


An element r € R is transcendental if it is not algebraic. 

In general it is very difficult to show that a particular element is transcendental. 
However there are uncountably many transcendental elements as we will show in 
Section 6.3.2. Specific examples are our old friends e and 77. We give a proof of their 
transcendence later in this chapter. 

Since e is transcendental, for any natural number 7 the set of vectors {1,e, 
e,..., e"} must be independent over Q, for otherwise there would be a polyno- 
mial that e would satisfy. Therefore, we have infinitely many independent vectors in 
R over Q which would be impossible if R had finite degree over Q. 


We are interested in special types of field extensions called algebraic extensions. 
We present the definitions in general and then specialize to extensions of the rationals 
Q within C. 


Definition 6.3.1 Suppose F’ is an extension field of F and a € F'. Then a is alge- 
braic over F if there exists a nonzero polynomial p(x) in F[x] with p(a) = 0. (a 
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is a root of a polynomial with coefficients in F .) If every element of F" is algebraic 
over F, then F' is an algebraic extension of F. 

If a € F' is nonalgebraic over F then a is called transcendental over F. A 
nonalgebraic extension is called a transcendental extension. 


Lemma 6.3.2. Every element of F is algebraic over F. 


Proof If f € F then p(x) =x — f € F[x] and p(f) = 0. 


The tie-in to finite extensions is via the following theorem. 
Theorem 6.3.1 [f F’ is a finite extension of F,, then F' is an algebraic extension. 


Proof Suppose a € F’. We must show that there exists a nonzero polynomial 0 4 
P(x) € F[x] with p(a) = 0. 

Since F’ is a finite extension, |F’ : F| =n < oo. This implies that there are n 
elements in a basis for F’ over F,, and hence any set of (n + 1) elements in F’ must 
be linearly dependent over F. 

Consider then 1, a, a*,..., a”. These are (n + 1) elements in F’ and therefore 
must be linearly dependent. Then there must exist elements 


fo. fis---. tn € F, 


not all zero, such that 
fot fiat---+ fra” =0. (6.3.1) 


Let p(x) = fot fix +---+ frx”. Then p(x) € F[x] and p(a) = 0 from (6.3.1). 
Therefore any a € F’ is algebraic over F and hence F’ is an algebraic extension 
of F. 


EXAMPLE 6.3.2 C is algebraic over R, but R is transcendental over Q. 

Since |C : R| = 2, C being algebraic over R follows from Theorem 6.3.1. More 
directly, if z € C then p(x) = (x — z)(x — Z) € R[x] and p(z) = 0. 

R (and thus C) being transcendental over Q follows from the existence of tran- 
scendental numbers such as e and 7. 


If ais algebraic over F, it satisfies a polynomial over F’. It follows that it must then 
also satisfy an irreducible polynomial over F’. Since F isa field, if f ¢ F with f £0 
and p(x) € F[x], then f~'p(x) € F[x] also. This implies that if p(a) = 0 with 
a, # 0 the leading coefficient of p(x), then p;(x) = a,! p(x) isa monic polynomial 
in F [x] that a also satisfies. Thus if a is algebraic over F there is a monic irreducible 
polynomial that a satisfies. The next result says that this polynomial is unique. 
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Lemma 6.3.3 [fa € F’ is algebraic over F, then there exists a unique monic irre- 
ducible polynomial p(x) € F[x] such that p(a) = 0. 
This unique monic irreducible polynomial is denoted by irr(a, F). 


Proof Suppose f(a) = OwithO ~ f(x) € F[x]. Then f (x) factors into irreducible 
polynomials. Since there are no zero divisors in a field, one of these factors, say 
Pi(x) must also have a as a root. If the leading coefficient of p(x) is a, then 
p(x) =a, 'p1(x) is a monic irreducible polynomial in F[x] that also has a as a 
root. 

Therefore, there exist monic irreducible polynomials that have a as a root. Let 
p(x) be one such polynomial of minimal degree. It remains to show that p(x) is 
unique. 

Suppose g(x) is another monic irreducible polynomial with g(a) = 0. Since p(x) 
has minimal degree, deg p(x) < deg g(x). By the division algorithm 


g(x) = q(x) p(x) + r(x), (6.3.2) 
where r(x) = 0 or deg r(x) < deg p(x). Substituting a into (6.3.2) we get 


g(a) = g(a) p(a) + r(a), 


which implies that r(@) = 0 since g(a@) = p(a) = 0. But then if r(x) is not identi- 
cally 0, a is a root of r(x), which contradicts the minimality of the degree of p(x). 
Therefore, r(x) = 0 and g(x) = g(x) p(x). The polynomial g(x) must be a constant 
(unit factor) since g(x) is irreducible, but then g(x) = | since both g(x), p(x) are 
monic. This says that g(x) = p(x), and hence p(x) is unique. 


We say that an algebraic element has degree n if the degree of irr(a, F) is n. 
Embedded in the proof of Lemma 6.3.3 is the following important corollary. 


Corollary 6.3.1 If a is algebraic over F and f(a) =0 for f(x) € F[x] then 
irr(a, F)| f(x). That is irr(a, F) divides any polynomial over F which has a as a 
root. 


Suppose a € F” is algebraic over F and p(x) = irr(a, F). Then there exists a 
smallest intermediate field E with F C E C F’ such that a € E. By smallest we 
mean that if E’ is another intermediate field with a € E’ then E C E’. To see that 
this smallest field exists, notice that there are subfields E’ in F’ in which a € E’ 
(namely F’ itself). Let E be the intersection of all subfields of F’ containing a and 
F. E is a subfield of F’ (see the exercises) and E contains both a and F. Further, 
this intersection is contained in any other subfield containing a and F. 

This smallest subfield has a very special form. 


Definition 6.3.2 Suppose a € F' is algebraic over F and 


p(x) =irr(a, F) = ay + ayx +++» + ay_yx™! + x". 
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Let 
F(a) = {fot fiat---+ fa-10”"|; fi € Fh. 


On F(a) define addition and subtraction componentwise and define multiplication 
by algebraic manipulation, replacing powers of a higher than a:"~! by using 


=! 
Qa = —dag aya spas An—1 a" : 


Theorem 6.3.2 F(a) forms a finite algebraic extension of F with 
|F(a): F| = deg(irr(a, F)). 


The field, F (a), is the smallest subfield of F' that contains the root a. A field extension 
of the form F(a) for some a is called a simple extension of F’. 


Proof Recall that F,-;[x] is the set of all polynomials over F of degree <n — 
1 together with the zero polynomial. This set forms a vector space of dimension 
n over F. As defined in Definition 6.3.2, relative to addition and subtraction F (@) 
is the same as F,,_;[x], and thus F(q) is a vector space of dimension deg irr(a, F) 
over F and hence an abelian group. 

Multiplication is done via multiplication of polynomials, so it is straightforward 
then that F(a) forms a commutative ring with an identity. We must show that it 
forms a field. To do this we must show that every nonzero element of F(a) has a 
multiplicative inverse. 

Suppose 0 4 g(x) € F[x].Ifdeg g(x) <n =degirr(a, F), then g(a) # Osince 
irr(a, F) is the irreducible polynomial of minimal degree that has a as a root. 

If h(x) € F[x] with deg h(x) => n, then h(a) = hy(a@), where hj (x) is a poly- 
nomial of degree < n — 1, obtained by replacing powers of a higher than a”~! by 
combinations of lower powers using 


a” = —ay — aja — +++ — aya" |. 


Now suppose g(a) € F(a), g(a) # 0. Consider the corresponding polynomial 
g(x) € F[x] of degree < n — 1. Since p(x) = irr(a, F) is irreducible, it follows 
that g(x) and p(x) must be relatively prime, that is, (g(x), p(x)) = 1. Therefore, 
there exist h(x), k(x) € F[x] such that 


g(x)h(x) + p@)k(x) = 1. 


Substituting a into the above we obtain: 


gla)h(a) + p(a)k(a) = 1. 
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However, p(a) = 0 and h(a) = hi(a) € F(a), so that 
g(a)hy(a) = 1. 


It follows then that in F(a), h;(q) is the multiplicative inverse of g(a). Since every 
nonzero element of F(a) has such an inverse F(a) forms a field. 

F is contained in F(q) by identifying F with the constant polynomials. There- 
fore, F(a) is an extension field of F. From the definition of F(a), we have that 
{l, a, a?,..., a’—!} form a basis, so F(a) has degree n over F’. Therefore, F (qx) is 
a finite extension and hence an algebraic extension. 

If F CE C F' and E contains a, then clearly E contains all powers of a since 
E is a subfield. E then contains F(a), and hence F(q) is the smallest subfield 
containing both F and a. 


EXAMPLE 6.3.3 Consider p(x) = x? — 2 over Q. This is irreducible over Q but 
has the root a = 2!/3 € R. The field Q(a) = Q(2!’3) is then the smallest subfield of 
R that contains Q and 2!/3, 

Here 

Qla) = {qo + gia + G20"; qi € Qand a? = 2}. 


We first give examples of addition and multiplication in Q(a). 
Let g = 3+4a+5a?,h =2—a+ 07. Then 


gt+h=5+3a+ 6a" 
and 


gh = 6 — 3a+ 3a’ + 8a — 4a’ + 407 + 10a — 5a? + 5a 
=6+5a+9a* — a? + 5a’. 
But a? = 2,soa* = 2a, and then 
gh =6+5a+ 90? —2+4+5(2a) = 44 15a+ 9a’. 
We now show how to find the inverse of h in Q(q). 
Let h(x) =2—x+x?, D(x) = x? —2. Use the Euclidean algorithm as in 
Chapter 3 to express | as a linear combination of h(x), p(x). 


ate =e 20 hee =), 


ea eb = (=e — oie 5) 2. 
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This implies that 


92 = —2 +20 £@ + NRK «+ 5)) = (Ge? =D 4 5) 


or 


1= SG? xt )-x" + dr + 1-163 Dx +5) 


Now substituting « and using that a? = 2, we have 


1= [0 -a+2)(-o? +4046), 
22 


and hence i 
hts ua +4a+6). 


Now suppose a, 3 € F’ with both elements algebraic over F and suppose 
irr(a, F) =irr(@, F). From the construction of F(a) we can see that it would 
be essentially the same as F (3). We now make this idea precise. 


Definition 6.3.3. Let F’, F" be extension fields of F. An F-isomorphism is an iso- 
morphism o: F' — F" such that o(f)=f for all f € F. That is, an 
F-isomorphism is an isomorphism of the extension fields that fixes each element 
of the ground field. /f F’, F" are F-isomorphic, we denote this relationship by 
Fi F", 

F 


Lemma 6.3.4 Suppose a,@¢€ F' are both algebraic over F and suppose 
irr(a, F) =irr(@, F). Then F(a) is F-isomorphic to F (3). 


Proof Define the map o : F(a) > F((@) byo(a) = Pando(f) = f forall f € F. 
Allow o to be a homomorphism, that is, preserve addition and multiplication. It 
follows then that o maps 


fot fiat---+ fra"! € F(a) to fot fibt+-:>+ fab" | € F(9). 


From this it is straightforward that o is an F-isomorphism. 


Further we note that if a, 3 € F’ with both algebraic over F and suppose that F (a) 
is F-isomorphic to F(3). Then there is ay € F((@) withirr(a, F) =irr(y, F). We 
can take for + the image of a under the F'-isomorphism. 

If a, G € F’ are two algebraic elements over F, we use F(a, 3) to denote 
(F(a))(3). F(a, 3) and F(@, a) are F-isomorphic so we treat them as the same. 
We now show that the set of algebraic elements over a ground field is closed under 


the arithmetic operations and from this obtained that the algebraic elements form a 
subfield. 
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Lemma 6.3.5 /fa, 3 € F’, 6 4 0 are two algebraic elements over F, then a + £, 
a, and a/ are also algebraic over F. 


Proof Since a, ( are algebraic, the subfield F(a, (3) will be of finite degree over F 
and therefore algebraic over F. Now, a, 3 € F(a, 2) and since F(a, (3) is a subfield, 
it follows that a + 3, a@ and a/@ are also elements of F(a, 3). Since F(a, 3) is an 
algebraic extension of F’, each of these elements is algebraic over F’. 


Theorem 6.3.3 [If F’ is an extension field of F, then the set of elements of F' that 
are algebraic over F forms a subfield. This subfield is called the algebraic closure 
of F in F’. 


Proof Let Ar(F’) be the set of algebraic elements over F in F’. Ar(F") 4 @ since 
it contains F. From the previous lemma it is closed under addition, subtraction, 
multiplication, and division, and therefore it forms a subfield. 


We close this subsection with a final result, that says that every finite extension is 
formed by taking successive simple extensions. 


Theorem 6.3.4 [f F’ is a finite extension of F, then there exists a finite set of alge- 
braic elements a, ..., Qn such that F’ = F(ay,..., Qn). 


Proof Suppose |F': F| =k < oo. Then F” is algebraic over F. Choose an ay € 
F’,a, € F.Then F C F(a) C F’and|F" : F(a;)| < k. Ifthe degree of this exten- 
sion is 1, then F’ = F'(a,), and we are done. If not, choose ana € F’, a2 ¢ F(a). 
Then as above 


FC F(a) C F(ay, a2) C F' with |F’: F(a, a2)| < |F’ : F(ay)|. 


As before, if this degree is one we are done; if not, continue. Since k is finite this 
process must terminate in a finite number of steps. 


6.3.1 Algebraic Extensions of Q 


We now specialize to the case where the ground field is the rationals Q. An algebraic 
number field is a finite and hence algebraic extension field of Q within C. Hence an 
algebraic number field is a field K such that 


QcKcCC 


with |K : Q| < oo. We will prove shortly that K is actually a simple extension of Q. 


Definition 6.3.4 An algebraic number a is an element of C which is algebraic 
over Q. Hence an algebraic number is an a € C such that f(a) = 0 for some 
F(x) € Q[x]. Ifa € Cis not algebraic it is transcendental. 
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We will let A denote the totality of algebraic numbers within the complex numbers 
C, and T the set of transcendentals so that C = AUT. In the language of the 
last subsection, A is the algebraic closure of Q within C. As in the general case, 
if a € C is algebraic we will let irr(a,Q) denote the unique monic irreducible 
polynomial of minimal degree that a satisfies over Q. Then irr(a, Q) divides any 
rational polynomial p(x) which satisfies p(a) = 0. 

If a ¢ Q then Q(aq) is the smallest subfield containing both Q and a. Since 
|Q(a) : Q| = deg(irr(a, Q)) it follows that K = Q(aq) is an algebraic number field. 
It then follows trivially that an algebraic number is any element of C which falls in 
an algebraic number field and A is the union of all algebraic number fields. 

We next need the following. 


Lemma 6.3.6 Jf p(x) € Q[x] is irreducible of degree n then p(x) has n pairwise 
distinct roots in C. 


Proof That p(x) has n roots is a consequence of the Fundamental Theorem of 
Algebra. What is important here is that if p(x) is irreducible over Q then its roots in 
C are distinct. 

Let c be a root of p(x). Then c is an algebraic number and then irr(c, Q)| p(x). 
Since p(x) is irreducible it follows that p(x) is just a constant multiple of irr(c, Q) 
and hence they have the same degree which is minimal among the degrees of all 
rational polynomials which have c as a root. 

Suppose that c is a double root. Then 


P(x) = (x - c)*h(x) where h(x) € C[x]. 


Now the formal derivative of a rational polynomial is also a rational polynomial. 
Therefore p’(x) € Q[x]. However from above using the product rule 


p(x) = 2(x — c)h(x) + (x — c)7h' (x). 
Therefore p'(c) = 0. This is a contradiction since deg(p’(x)) < deg(p(x)). There- 


fore a root cannot be a double root and hence all the n roots are pairwise 
distinct. 


It follows that if a is an algebraic number of degree n then its minimal polynomial 
irr(a, Q) has n distinct roots in C. 


Definition 6.3.5 [fa is an algebraic number, then its conjugates over Q, consist of 
the set, {a, = Q,..., Qn}, of distinct roots of irr (a, Q) in C. 


Since distinct monic irreducible polynomials cannot have a root in common it fol- 
lows that if a; is conjugate to a then irr(a;, Q) = irr(a, Q) (see exercises). It fol- 
lows that Q(a;) is Q-isomorphic (see last section) to Q(a@) with the Q-isomorphism 
being given by 9: | > la — qj. 

We now get that any algebraic number field is actually a simple extension of Q. 
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Theorem 6.3.5 Any algebraic number field K is a simple extension of Q, that is, 
K = Q(a) for some algebraic number a. a is called a primitive element. 


Proof Since K isa finite extension, K = Q(q,..., @,) for some algebraic numbers 
Q,..., Q@,. Then to show that K is a simple extension it is sufficient to show that 
Qa, B) = Q(9) for algebraic numbers a, (3. 

Let aj = a, ..., Q, be the conjugates of a over Q and let 3; = B,..., Bn be the 
conjugates of @ over Q. If 7 A 1 then (@; # G since the conjugates are distinct. It 
follows that for eachi = 1,...,m andeach j 41, 7 = 2,...,m the equation 


aj + Bjx =at Bx 


has exactly one complex solution and hence at most one rational solution. Since there 
are only finitely many such equations there are only finitely many rational solutions 
x and therefore there exists a rational number g with gq 4 0 and gq differing from all 
the solutions. That is 


a; + Big Fat Bq 


for alli and all j £ 1. 

Let y= a+q/. We claim that Q(a, 3) = Q(7). Since Q(a, 7) contains all of 
Q as well as a and @ it is clear that y € Q(a, () and hence Q(y) C Q(a, 3). We 
show that Q(a, 3) C Q(). Here it suffices to show that each of a, 3 € Q(y). 

Let f(x) =irr(a, Q) and g(x) = irr(3,Q). Then f(y— 43) = f(a) =0. 
Therefore (@ is a root of the polynomials g(x) and h(x) = f(y — qx). If h(G;) = 
f(y — 48) = 0 for some conjugate 3; A 2 then y — 6;q = a; for some a; contra- 
dicting the choice of g. Therefore g(x) and h(x) have only @ as a common root. 

Now g(x) and h(x) = f(y — qx) are polynomials in K[x], where K = Q(y). 
Since Q(a, 3) has finite degree over Q then Q(() has finite degree over Q(q) and @ 
is algebraic over K. Let hi (x) = irr(G, K). Since g(G) = 0 and h(3) = Oit follows 
that h,(x)|g(x) and hy; (x)|hA(x) in K [x]. Since then every root of h; (x) is then a root 
of both g(x) and h(x) and ( is the only common root of g(x) and h(x) it follows 
that h;(x) must have degree one. Therefore 


h\(x) =ax+bforsomea,be K. 


But 1,(8) =0so 6 = = € K. Therefore G € K = Q(y). An analogous argument 
shows that a € K. Hence Q(a, 3) C Q(y) and so 


Qla, B) = QM). 


Let K be an algebraic number field and a a primitive element so that K = Q(a). 
It follows that K must have at least one basis (as a vector space over Q) of the form 
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wheren = |K : Q|. We will use this observation in Section 6.3.4 to define an invariant 
of a number field called its discriminant. 


6.3.2 Algebraic and Transcendental Numbers 


In this section we examine the sets A and T more closely. Since A is precisely the 
algebraic closure of Q in C we have from our general result that A actually forms a 
subfield of C. Further since the intersection of subfields is again a subfield it follows 
that A’ = ANR the real algebraic numbers form a subfield of the reals. 


Theorem 6.3.6 The set A of algebraic numbers forms a subfield of C. The subset 
A’ = AN R of real algebraic numbers forms a subfield of R. 


Since each rational is algebraic it is clear that there are algebraic numbers. Fur- 
ther there are irrational algebraic numbers, /2 for example, since it satisfies the 
irreducible polynomial x* — 2 = 0 over Q. On the other hand we have not examined 
the question of whether transcendental numbers really exist. To show that any par- 
ticular complex number is transcendental is in general quite difficult. However it is 
relatively easy to show that there are uncountably infinitely many transcendentals. 


Theorem 6.3.7 The set A of algebraic numbers is countably infinite. Therefore, T, 
the set of transcendental numbers, andT' = T OR, the real transcendental numbers 
are uncountably infinite. 


Proof Let 
Pn = {f (x) € QLx]; deg(f(x)) < n}. 


Since if f(x) € Py, f(®) =o + qx +++: + nx" with g; € Q we can identify a 
polynomial of degree < n with an (n + 1)-tuple (go, qi, ... , dn) of rational numbers. 
Therefore the set P,, has the same size as the (n + 1)-fold Cartesian product of Q: 


Qe'=QxQx-:-:-x@. 


Since a finite Cartesian product of countable sets is still countable it follows that P,, 
is a countable set. 
Now let 


B, = U {roots of p(x)}, 
P(X)EPn 


that is 6, is the union of all roots in C of all rational polynomials of degree <n. 
Since each such p(x) has a maximum of 7 roots and since P,, is countable it follows 
that 5,, is a countable union of finite sets and hence is still countable. Now 
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so A is a countable union of countable sets and is therefore countable. 

Since both R and C are uncountably infinite, the second assertions follow directly 
from the countability of A. If say J were countable then C = AUT would also be 
countable which is a contradiction. 


Therefore we now know that there exist infinitely many transcendental numbers. 
Liouville in 1851 gave the first proof of the existence of transcendentals by exhibiting 
a few. He gave as one the following example. 


Theorem 6.3.8 The real number 


is transcendental. 


Proof First of all since io < = and > j=l — is a convergent geometric series 
it follows from the comparison test that the ane series defining c converges and 
defines a real number. Further since 5°“° j=l — = = 5 It follows that c < 5 <i. 
Suppose that c is algebraic so that g(c) = 0 for some rational nonzero polyno- 
mial g(x). Multiplying through by the least common multiple of all the denomina- 
tors in g(x) we may suppose that f(c) = 0 for some integral polynomial f(x) = 


Dj-o mjx/. Then c satisfies 
n 
ws 
> mjc’ =0 
j=0 


for some integers mo, ..., Mn. 
If0 <x < 1 then by the triangle inequality 


n n 
FOl=(> jee (= > lags, 
j=l j=l 


where B is a real constant depending only on the coefficients of f(x). 


Now let 
k 
a= lig 


be the Ath partial sum for c. Then 


= J 1 
eee) = > 10! <2: 104+)!" 
j=k+l 
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Apply the Mean Value Theorem to f(x) at c and c, to obtain 


Ifo) — flc)| = le- all FO! 


for some ¢ with cg, <¢ <c <1.Nowsince0 <¢ < 1 we have 


1 
lc— call f’OI < 2B Oe 


On the other hand, since f(x) can have at most n roots, it follows that for all k 
large enough we would have f (cy) 4 0. Since f(c) = 0 we have 


n , 1 
If(c) — fc) | = lf (ce) | = | >i mjchl ~ Tonk 
j=l 


since for each j, m ich is a rational number with denominator 10/"". However if k is 
chosen sufficiently large and n is fixed we have 


1 2B 
1ork! a 1Q&+D! 


contradicting the equality from the Mean Value Theorem. Therefore c is transcen- 
dental. 


After we discuss algebraic integers we will show that both e and 7 are transcen- 
dental. The transcendence of e was proved first by Hermite in 1873 while Lindemann 
in 1881 proved the transcendence of 7. 


6.3.3 Symmetric Polynomials 


Many results on algebraic number fields and algebraic integers depend on the prop- 
erties of symmetric polynomials. These were briefly introduced and used in Section 
5.2.1. Here we look at them more carefully and present a fundamental result con- 
cerning them. 


Definition 6.3.6 Let y,,..., ¥, be (independent) indeterminates over a field F. A 
polynomial f(y, ---,; Yn) € Fly, ---, Yn] isasymmetric polynomial in y,, ..., yn 
if fQu1,---,¥n) is unchanged by any permutation o of {y1,..-, Yn}, that is, 
FOI. + + Yn) = FCO), «+ TOn))- 

If F C F' are fields and a,...,Q, are in F’, then we call a polynomial 
f(ai,.--, Qn) with coefficients in F symmetric in a),...,Q, if f(Q1,.-.-, An) 
is unchanged by any permutation o of {a4,..., An}. 
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EXAMPLE 6.3.3.1 Let F be a field and fo, fy € F. Let A(y), 2) = foi + 


y2) + fiQiya). 

There are two permutations on {y,, y2}, namely a; : yj > yj, y2 > y2 anda: 
yl — 2, 2 > Y1- 

Applying either one of these two to {y1, y2} leaves h(y1, y2) invariant. Therefore, 
h(y1, y2) is a Symmetric polynomial. 


Definition 6.3.7 Let x, y,,..., ¥, be indeterminates over a field F (or elements of 
an extension field F' over F ). Form the polynomial 


P(X, 1, --+s Yn) = & — yi) ++ & — Yn). 
The ith elementary symmetric polynomial s; in y),...,y, fori=1,...,n, is 


(—1)'a;, where a; is the coefficient of x"! in p(X, ¥1,---, Yn) as a polynomial in x 
with coefficients from F(y\,..., Yn): 


EXAMPLE 6.3.3.2 Consider y,, y2, y3. Then 


P(X, Yi, Ya, W3) = (X — y1)(X — y2) (x — ys) 
= x3 = (yy + yp + ys)x? + (viy2 + is + Yays)x — Yas. 


Therefore, the three elementary symmetric polynomials in y;, y2, y3 over any field 
are 


1. s; =y, + y2 + y3. 
2. $2 = yi yo + Yi y3 + Y2y3. 
3. $3 = yi y2y3. 


In general, the pattern of the last example holds for y,,..., y,. That is, 
Sp=yityot--+ +n 
S2 = yi y2 + y1Y3 ee Yn—1Yn 


83 = yi y2y3 + yiyoya +++ + Yn—-2Yn-1Yn 


Sn = Yi°'*Yn- 


The importance of the elementary symmetric polynomials is that any symmetric 
polynomial can be built up from the elementary symmetric polynomials. We make 
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this precise in the next theorem called the fundamental theorem of symmetric 
polynomials. We will use this important result several times in our study of algebraic 
numbers and algebraic integers. 


Theorem 6.3.9 (Fundamental Theorem of Symmetric Polynomials) If P is a sym- 
metric polynomial in the indeterminates y,,.., y, over a field F, that is, P € 
F[y1, --; Yn] and P is symmetric, then there exists a unique g € F[y,, .., Yn] such 
that P(y1,--+; Yn) = (81, --, Sn). That is, any symmetric polynomial in y,,..., Yn 
is a polynomial expression in the elementary symmetric polynomials in y,, .., Yn. 


In order to prove this result we need the concept of a piece. Any polynomial 
f(t, ---,Xn) € F[X1, .., Xn] is composed of a sum of pieces of the form ax;' . + cin 
with a € F. We first put an order on these pieces of a polynomial. 

The piece ax}! --- x!» with a # 0 is called higher than the piece bx/' - -- x" with 
b & Oif the first one of the differences 


iy — Ji, l2— jo, tees lic Jn 


that differs from zero is in fact positive. The highest piece of a polynomial f(x;,..., 
Xn) is denoted by HG(f). 


Lemma 6.3.7 For f (x1,.--,%n); G(X1, ++ Xn) € FLX, .--;Xn] we have 
HG(fg) = HG(f)HG(g). 


Proof We use an induction on n, the number of indeterminates. It is clearly true for 
n = 1, and now assume that the statement holds for all polynomials in k variables 
with k <nandn > 2. Order the polynomials via exponents on the first variable x, 
so that 


—1 
F Bigs se Ra) =F Pr Oe she) +x} Or—-1(%2, +++, Xn) +++ + Go(X2,---, Xn), 
s s—1 
G(X1, -.+, Xn) = 1s (X2, -. +s Xn) + x} W112, +++ Xn) +++ + Yo(X2,---,Xn)- 
Then 


HG(fg) = x; HG(¢,4s). 


By the inductive hypothesis 


HG($,)s) = HG(¢,)HG(ys). 
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Hence 


HG(fg) =x, HG(b,)HGWs) 


= (x HG(¢,))(x} HG(s)) 
= HG(f)HG(g). 


In general the kth elementary symmetric polynomial is given by 


se > Xi, Xin Xips 


1, <ig<++ <i 
where the sum is taken over all the (7) different systems of indices i,,..., i, with 
ij <iyg <--+ < ix. We need the following concerning the pieces of sx. 


Lemma 6.3.8 Jn the highest piece axt' a 
S(X1,...,Xn) we have ky > ky > --- > ky. 


a # 0, of a symmetric polynomial 


n>? 


Proof Assume that k; <k; forsomei < j. As asymmetric polynomial, s(x;,..., 


: : k k ki : bog ss 
X,) also must then contain the piece ax;'---x;"--- oe - xk» which is higher than 
ia as ag age? es iving a contradiction. 
xy i xj xi", giving 
Lemma 6.3.9 The product st s52—® .... sh" sk» with ky > ky > +++ > ky has 
oe ep 1 2 n—-1 n W I sz OD = *n 


the highest piece ae ve kn, 
Proof From the definition of the elementary symmetric polynomials we have that 
HG(s,) = (xixa- ++ xx), 1<k <n,t> 1. 


From Lemma 6.3.7, 


—ka 6 ba ks kn1—kn kn 
HG(s*' iS s,”) 


n—-1 
—k ky—k kn-1—kn kn 
Sah Gee) so Gey es ay) 
— yk ke ky 
=X Xo Xn 


We can now prove the fundamental theorem of symmetric polynomials. 


Proof (Theorem 6.3.7) Let s(x1,...,%,) € F[x1,...,X,] be a symmetric polyno- 
mial. We must show that s(x;,...,x,) can be uniquely expressed as a polynomial 
f(s1,...,8,) in the elementary symmetric polynomials 5), ..., 5, with coefficients 
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from F’. We prove the existence of the polynomial f by induction on the size of the 
highest piece. If in the highest piece of a symmetric polynomial all exponents are 
zero, then it is constant, that is, an element of F and there is nothing to prove. 

Now we assume that each symmetric polynomial with highest piece smaller than 


that of s(x1,...,X,) can be written as a polynomial in the elementary symmetric 
polynomials. Let ax" --- xk a #0, be the highest piece of s(x), ..., Xn). Let 
kiko kn-1—kn kn 
t(X1,-2+5Xn) = S(X1,..-,%n) — asp Fey 


Clearly, ¢(x1, ..,X,) is another symmetric polynomial, and from Lemma 6.3.9 the 


highest piece of f(x1,...,%X,) is smaller than that of s(x1,...,X,). Therefore, 
t(x],...,X,) and hence s(x1,...,X)) =t(X1,.., Xn) + a . «girth glen can be 
written as a polynomial in s),..., Sy. 

To prove the uniqueness of this expression assume that s(x1,...,%n) = f(si,..., 
Sn) =r (S1,---, Sy). Then f(s, ..., 5.) —r(S1, --, Sn) = A(S1,---, Sn) = OC, ---, 
X,) is the zero polynomial in x), ..., X,. Hence, if we write h(s1,..., 5,) as asum of 
products of powers of the s;,..., 5,, all coefficients disappear because two different 


products of powers in the s), .., s, have different highest pieces. This follows from 
Lemma 6.3.9. Therefore, f andr are the same, proving the theorem. 


From this theorem we obtain the following theorem, which is crucial in our study 
of both algebraic numbers in general and algebraic integers. 


Theorem 6.3.10 Let a be an algebraic number and a,,..., Qy, be its set of con- 
jugates in C. Then any symmetric polynomial in ay,..., Qn, over Q is a rational 
number. 

Proof Since a is algebraic we have irr(a, Q) € Q[x]. Since aj,..., a, are the 


conjugates of a we have that irr(a, Q) splits in C as 


irr(a, Q) = (x — a1) (x — a2)- ++ (% — On). 


Therefore the coefficients of irr(a, Q) are up to +1 precisely the elementary sym- 
metric polynomials in the conjugates. Since irr(a, Q) € Q[x] it follows then that 
any elementary symmetric polynomial in the conjugates of a is a rational num- 
ber and then Theorem 6.3.10 follows from the fundamental theorem of symmetric 
polynomials. 


6.3.4 Discriminant and Norm 


We introduce certain complex numbers that will be used to further describe both 
algebraic numbers and algebraic number fields. We first must extend our definition 
of conjugate. 
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Let K = Q(@) be an algebraic number field of degree n. Then K has precisely n 
embeddings o; : K — C which fix Q. These can be defined by g; : 1 > 1,4 > 6;, 
where 6; is a conjugate of 6. Now let a € K of degree m. Since |K : Q(a)||Q(a) : 
Q| = |K : Q| it follows that m|n. Letd = 7. 


Definition 6.3.8 Let K be an algebraic number field of degree n and a € K of 
degree m. Then the set of conjugates of a for K is the set {o;(a)}, where o; are 
the n embeddings of K into C. 


Lemma 6.3.10 Let K be an algebraic number field of degree n and a € K of 
degree m. Then the set of conjugates of a for K consists of the m distinct con- 
jugates of a in C each repeated d = ~ times. 


Proof On the set of n embeddings K — C fixing Q define the relation o ~ 7 if 
a(a) = T(qa). This is an equivalence relation (see exercises). Each equivalence class 
has size |K : Q(a)| = dandhence there are m of them. Since each o(q) is aconjugate 
of a in C it follows that the set {a;(a@)} consists of the m conjugates of a in C each 
repeated d times. 


Hence ana € K always has n conjugates for K. By looking at degrees it follows 
that these conjugates will be distinct if and only if K = Q(a). Next we define the 
discriminant of a basis. 


Definition 6.3.9 Let K be an algebraic number field of degree n and let a,,..., Qp 
be a basis for K over Q. For each a; let ajj, j = 1,...,n be the n conjugates of a; 
for K. Then the discriminant of the basis ay, ..., Ay is 
2 

Oy, 12 +--+ Ain 

2 Q21 22 ... AZ, 

A(Qy,.--, An) = (det(aij))” = " 

Qnl An2 +++ Ann 


Notice that if we change the ordering of the basis we interchange a column of 
the matrix (a;;) and thus multiply the determinant by +1. Hence by squaring the 
determinant the value remains the same. Therefore the discriminant of a basis is 
independent of the ordering. Second, notice that if 3,,..., @, is another basis then 


A(A1,- ++ Bn) = Mei)? A(ar, - ++, On) 


where (c;;) is the transition matrix. Therefore the discriminant of any basis has the 
same sign. We show below that the discriminant is a rational number. 


Theorem 6.3.11 Let K = Q(a) be analgebraic number field. Then the discriminant 
of any basis is rational and nonzero. 


Proof Now A(qj,..., @,) is a symmetric function of a), ..., @, and their conju- 
gates so by the results of the last section it follows that the discriminant is rational. 


6.3 Algebraic Number Fields 327 


Since K = Q(q) it has a basis of the form 1, a,..., a"—| Tf a; is a conjugate of 


a then a} is a conjugate of a’. Therefore if a; = a, ..., Qn are the conjugates of a 


for K we have F 
2 n—-l 

lq oa 

_ la@mas...a, 
A(1,a,...,a"') = 202 2 


2 n—1 
1 ana 


ee oe O 


This determinant is called the Vandermonde determinant and can be shown to have 
the value (see exercises) 


lq at Sie a 
= 
l1aa...a 
V(a) = 2 2 | =[[C@;- a). 
on - a 
1 a, a2... a"! 


n n 


Since the elements of a basis are all distinct it follows that V(a) £0 so that 
A(l,a,...,a"7!) 0. Since the discriminant of one basis is nonzero the discrim- 
inant of any basis is nonzero completing the theorem. 


As part of our discussion of algebraic integers in the next section we will look at 
bases which have minimal discriminant and from these define the discriminant not 
only of a particular basis but as an invariant of the whole field K. 

We next define two further concepts. 


Definition 6.3.10 Suppose a € K, where K is an algebraic number field of degree n. 
Let 


Qa, = 01(Q),..., An = On(Q) 


be the conjugates of a for K, where the o; are the n embeddings of K into C. Then 
the norm of a in K is 
Nx (a) = @1Q02°°: Ay. 


This definition agrees with our previous definition of norm in Z[i]. If a € Z[i] C 
Q@) = K then its conjugate for K is precisely its complex conjugate @. To see this 
notice that if a=a+bi € Z[i] then p(a) = 0, where p(x) = (x —a)(x —@) € 
Q[x]. Ifa ¢ Zthen p(x) = irr(a, Q). Hence Nx (a) = a@ = a* + b? which agrees 
with the previous definition. We will discuss quadratic integers and their norms more 
completely in the next section. In Z[i] the norm was multiplicative and always had 
rational value. In general: 


Lemma 6.3.11 (1) Nx(qQ) is a rational number for a € K. 
(2) If a, 8 are in the algebraic number field K then 


Nx(aB) = Nxg(a)Nx(). 
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Proof Tf aj,..., @, are the conjugates of a for K then the norm Nx (q) is a sym- 
metric function of a),..., @, and hence rational. 
If G,,..., G, are the conjugates of 3 for K then a), ..., @,3, are the conjugates 


of a@ for K. It follows that Nx (a3) = Nx(a)Nx((). 


Finally if a € K for an algebraic number field K we define the trace of a in K 
as trc (a) = ay +---+ Q@,, where aj = 0)(Q),..., A, = On(Q) are the conjugates 
of a for K. 

Now let K = Q(6) be an algebraic number field of degree n. For a € K define 
the mapping 7, : K — K by 

Ty (x) = ax. 


This is a linear transformation of the n-dimensional Q-vector space K (see exercises) 
and therefore is given by ann x n matrix. This matrix is related to the trace and norm 
in the following manner. 


Theorem 6.3.12 Let K = Q(@) be an algebraic number field of degree n and let 
a € K. Then if T, is the linear transformation defined above 


TI. Nx(a) = det(T,,) 
2. tre(a) = tr(Ty) 


Let fa(t) = det(tl — T,) be the characteristic polynomial of T,, and let p,(t) = 
irr(a, Q). Theorem 6.3.12 will then follow from the next two lemmas. Notice that 
the multiplicativity of the norm and the additivity of the trace follow directly from 
this matrix formulation. 


Lemma 6.3.12 Let K be an algebraic number field of degree n and a € K of degree 
m. Let d = ~ and suppose that fa(t) and pa(t) are as above. Then 


£0 =G20y. 


Proof Let p(t) = t™ + cm—1t™—! + +» + ¢9. Now {1, a, a, ..., o"—!} is a basis 
for Q(@) over Q. Let a1, ..., @q be a basis for K over Q(qa). Then 
{o1, 10,...,0,;0" ',...,ag0a” |} 


is a basis of K over Q. The matrix of the linear transformation 7,, with respect to 


this basis has the form 
MO... 


OM... 
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where 
00...0 —Cco 
10...0 -c, 
M= 0 1 0 —C2 
00...1 —cy_1 


The characteristic polynomial of M is 


det(tl — M) = t™ +. Cm_t™ | +--+ +9 = po(t). 


Then from the form of the matrix for T, we have fa(t) = (Pa (t))?. 
Lemma 6.3.13 Let o run through all the embeddings of K into C which fix Q. Then: 


1. folt) = [], = 0 @)) 
2. tre(a) = >, a(a) 
3. Nx(a) = [], o(a) 


Proof As before the embeddings of K into C fall into m equivalence classes. Let 
O|,-++-+, Om be a set of representatives. Then 


m 


Polt) = | [@— o;(0)) 


i=l 


and from the previous lemma 


folt) = ([ [@ - oi(a)))" 


i=1 
m 


=|] [][¢--@) =[]e-c@) 


i=1 o~o; o 


This proves part (1) and the other two parts follow directly from the definitions trace 
and norm in terms of a. 


6.4 Algebraic Integers 


We now look at integers in an algebraic number field. 


Definition 6.4.1 An algebraic integer is a complex number a that is a root of 
a monic integral polynomial. That is, a € C is an algebraic integer if there exists 
f(x) € ZLx] with f (x) = x" + b,x"! +--+ + bo, b; € Zn > 1, and f(a) = 0. 
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An algebraic integer is clearly an algebraic number. Hence there exists p(x) = 


irr(a, Q). 


Lemma 6.4.1 Ifa € C is an algebraic integer, then all its conjugates, Q\,..., Mn; 
over Q are also algebraic integers. 


Proof Let f(x) € Z[x] be a monic polynomial with f(a) =0. Let p(x) = 
irr(a, Q). Let aj,...,Q, be the conjugates of a. Since p(x) = irr(a,Q) = 
irr(a;, Q) = Po, (x), fori = 1,...,n we have pa, (x)| f(x) fori = 1,...,n. Hence 
f(ai) = Ofori = 1,...,n. 


Lemma 6.4.2 a € C is an algebraic integer if and only if irr(a, Q) € Z[x]. 


Proof Ifirr(a, Q) € Z[x] then a is an algebraic integer directly from the definition. 
To prove the converse we need the concept of a primitive integral polynomial. 
This is a polynomial p(x) € Z[x] such that the gcd of all its coefficients is 1. The 
following can be proved (see exercises): 
(1) If f(x) and g(x) are primitive then so is f(x)g(x). 
(2) If f(x) € Z[x] is monic then it is primitive. 
(3) If f(x) € Q[x] then there exists a rational number c such that f(x) = cf\(x) 
with f(x) primitive. 
Now suppose f(x) € Z[x] is a monic polynomial with f(a) = 0. Let p(x) = 
irr(a, Q). Then p(x) divides f(x) so f(x) = p(x)q(x). 
Let p(x) = c1 pi (x) with p; (x) primitive and let g(x) = c2q2(x) with g2(x) prim- 
itive. Then 


f(x) = cpi)qia). 


Since f(x) is monic it is primitive and hence c = 1 so f(x) = pi(x)qi(x). 

Since p(x) and q;(x) are integral and their product is monic they both must be 
monic. Since p(x) = c,p;(x) and they are both monic it follows that c; = | and 
hence p(x) = p,(x). Therefore p(x) = irr(a, Q) is integral. 


We now show the close ties between algebraic integers and rational integers. 


Lemma 6.4.3 [f a is an algebraic integer and also rational then it is a rational 
integer. 


Proof If a € Q then irr(a, Q) = x — a. But if a is also an algebraic integer than 
irr(a, Q) € Z[x]. Hence x — a € Z[x] anda € Z. 


The following ties algebraic numbers in general to corresponding algebraic inte- 
gers. Notice that if g € Q then there exists a rational integer n such thatng € Z. This 
result generalizes this simple idea. 


Theorem 6.4.1 Jf6 is an algebraic number then there exists a rational integerr 4 0 
such that r@ is an algebraic integer. 


6.4 Algebraic Integers 331 


Proof Since @ is an algebraic number there exists a p(x) € Z[x] with p(@) = 0. 
Suppose p(x) = a,x" + nx"! + +++ + a9 with a; € Z. Then 


a0" + aie +---+a) = 0. 
Let ¢ = a,0. Then 
¢” as Gat? - Cn a Sl ah sh aay = 0. 


Let p(x) = x” + dy_1x"! + dndy_2x"-* + +--+ .a"~!ag. Then from the above 
p(¢) = O and therefore ¢ = a, is an algebraic integer. 


6.4.1 The Ring of Algebraic Integers 


We saw that the set A of all algebraic numbers is a subfield of C. We now show 
that the set Z of all algebraic integers forms a subring of A. First an extension of the 
following result on algebraic numbers. 


Lemma 6.4.4 Suppose a1, ..., A, are the set of conjugates over Q of an algebraic 
integer a. Then any integral symmetric function of a, ..., Gy, is a rational integer. 


Proof We have irr(a, Q) = (x — a)-+-(* — a) € Z[x]. Hence the elementary 
symmetric functions are rational integers. It follows from the fundamental theorem 
of symmetric polynomials that any integral symmetric function is also a rational 
integer. 


Theorem 6.4.2 The set T of all algebraic integers forms a subring of A. 


Proof Clearly it suffices to show that if a, 3 are algebraic integers then so are a + (3 
and af. Leta; = a,..., @, be the conjugates of a and 3; = @,..., B, the conju- 
gates of 3. Let 


Fx) =] [J — i + 8p) S20 $ dag nix hE Ee Edy. 


i=l j=l 


The coefficients d, are symmetric functions in a;, 3 };, and therefore from the remarks 
above we have d; € Z. It follows that f(x) € Z[x] and further f(a + 3) = 0. There- 
fore, a + / is an algebraic integer. We treat a — 3 and af analogously. 


We note that A, the field of algebraic numbers, is precisely the quotient field of 
the ring of algebraic integers. 

Now let K = Q(@) be an algebraic number field and let Ox = K NZ. Then Ox 
forms a subring of K called the algebraic integers or just integers of K. Further 
analysis of the proof of Theorem 6.4.1 shows that each 3 € K can be written as 
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= [2 


with a € Ox andr € Z. 
We now look at the norms of algebraic integers. 


Lemma 6.4.5 /f a is an algebraic integer then N(q) is a rational integer. 


Proof N(a) = a, --+Q,, where ay = 01(Q),..., @, = Op, (Q) are the conjugates of 
a for K. But this is an integral symmetric function of the conjugates and by Lemma 
6.4.4 it is a rational integer. 


Lemma 6.4.6 Let K = Q(6) be an algebraic number field. Then a is a unit in Ox 
if and only if N(a) = +1. 


Proof If af = 1 then 1 = N(aZ) = N(a)N(@). But N(a@), N(() are rational inte- 
gers so |N(a)| = |N()| = 1. 

Conversely suppose N(qa) = +1. If a = aj, and a2, ... a, are the conjugates of 
ain K then 


ay: A, = 1 = a(an-:-a,) = 1. 


Since K 1s a field ay! = Q2°++-Q, € K. But ag---a, is an algebraic integer so 
Q2*++Q, € Ox. Hence ais a unit in Ox. 


Based on the multiplicativity of the norm we obtain prime factorizations (not 
necessarily unique) in any algebraic number ring Ox. Notice first that there are 
no primes at all in Z the set of all algebraic integers. If a € TZ then a= /a,/a, 
where ./a € C. However if p(a) = 0 for p(x) € Z[x] then p,(./a) = 0, where 
pi(x) = p(x’). Hence \/a is also an algebraic integer. Since this is true for any 
a € T there is always a nontrivial factorization and hence a cannot be prime. 

From now on K will denote an algebraic number field and Ox its ring of integers. 


Lemma 6.4.7 [fa € Ox and N(a) = p, where p is a rational prime then a is a 
prime in Ox. 


Proof Suppose a = Gy. Then N(a) = N(3)N(qy). Since all are rational integers 
and N(q) is prime we must have either |N(3)| = 1 or |N(y)| = 1 from which it 
follows that either 7 or ¥ is a unit. 


Theorem 6.4.3 Let K be an algebraic number field and Ox its ring of integers. 
Then each a € Ox is either 0, a unit or can be factored into a product of primes. 


Proof Suppose a ¥ 0 is not a unit. Then N(a) 4 1. We do an induction on |N(q)|. 
If |N(q)| = 2 then a is prime from Lemma 6.4.7. Suppose |N(q)| > 2. If a= By 
then if neither (@ or y is a unit it follows that |N(@)| < |N(a@)| and|N(y)| < |N(a)I. 
From the inductive hypothesis it follows that both 3 and y have prime factorizations 
and hence so does a. 
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We stress again that the prime factorization need not be unique. However from 
the existence of a prime factorization we can mimic Euclid’s original proof (see 
Chapter 2) to obtain: 


Corollary 6.4.1 There exist infinitely many primes in Ox for any algebraic number 
ring Ox. 


6.4.2 Integral Bases 


If K has degree n over Q we show that there exists w),..., Wy, in Ox such that each 
a € Ox is expressible as 


Aa=mwy+---+Mywn, 


where m,..., My € Z. 


Definition 6.4.2 An integral basis for Ox is a set of integers 
W1,...,u, € Ox 
such that each a € Ox can be expressed uniquely as 
A=mw,+---+mM;,u,;, 


where m,,...,m; € Z. 
We show first that there must exist an integral basis. 


Theorem 6.4.4 Let Ox be the ring of integers in the algebraic number field K of 
degree n over Q. Then there exists at least one integral basis for Ox. 


Proof Since K has degree n there is a basis w),...,W, for K over Q. Each w; is 
algebraic, so by Theorem 6.4.1 for each i there is a rational integer r; such that 
rjw; € Ox. Multiplying through by a large enough rational integer r we would have 
rw ,...,rwW, all in Ox, These are clearly still independent so they still constitute 
a vector space basis of K over Q. It follows that K has bases (as a vector space) 
which are all integers in Ox. Further if w),...,w, is such a basis for K all in Ox 
then the discriminant of this basis A(w , ..., w,) must be a rational integer since the 
discriminant is a symmetric polynomial over Z of its arguments. 

Among all bases of K which are in Ox choose one, say w),...,W,, With 
|A(w 1, ..., W,)| minimal. This exists since these values are positive rational integral. 
We claim that this is an integral basis for Ox. 

Let a € Ox. Sincea € K anduy,..., wy, is a basis over Q, 


a= qywyte-+ + Gnwn 
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with g; € Q. We show that each qg; must be a rational integer. Suppose that gq; is 
not rational. Then g; = m, +r, withm, € Zand0O <r, < 1. Consider now the set 


wh = (qi — m)wy + Gow. +--+ + qnwn 


ms sets 

wr = wifi Al. 
The transition matrix from w1,..., Wy, tow}, ..., WF Is 

qd—-™my q2 ++ Qn 

0 
C= 
1 

This has determinant g; —m, =r; > 0 so wj,...,w7 is another basis consisting 


solely of integers. Its discriminant is 

A(wt,..., WU") = PAW, snag tg) 
Since r; < 1| this implies that 

|A(w,...,wt)| < |A(i,...,Wn)| 


contradicting the minimality of | A(w 1, ...,w,)|. Therefore r = O and q,; = m, € Z. 
The other coefficients follow in the same manner. 


Therefore Ox has at least one integral basis. We next show that the cardinality of 
any integral basis is the same as the degree of K. 


Theorem 6.4.5 Let Ox be the ring of integers in the algebraic number field K of 
degree n over Q. Then any integral basis for Ox is also a basis for K over Q. Hence 
the cardinality of any integral basis is the same as the degree of K. Further all 
integral bases have the same discriminant. 


Proof Let w,,...,w,; be an integral basis and suppose a € K. Then there exists an 
réZ,r £0, withra € Ox. Hence 


ra=myw, +--+ +m,u; with m; € Z. 


Then 
my mM, 
a= —u,+---+ —u4}. 
r r 
Therefore w,,...,W; span K as a vector space over Q. We must show that they are 


independent over Q. 
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Suppose qiw; +--+: + q;w; = 0. Then multiplying through by the lcm of the 
denominators of the g; we obtain mjw; +---+m,w,; = 0 for some m; € Z. Since 


W,,..., 4; is an integral basis it follows that each m; = 0. But then each gq; = 0 and 
therefore w,..., Ww, are independent and hence form a basis. 

It then follows that t =n, wheren = |K : Ql]. 

Now let w},...,w, and ¢),..., ¢, be two integral bases. Their transition matrix 


C = (cj) is rational integral and 
Aa, ..+, Wa) = Gp)PAG, «...G)- 


It follows that A(w1,...,W,) divides A(1,..., ¢,). Reversing the roles we get that 
A(G1,--+5 Gn) divides A(w 1, ..., W,) and therefore A(w),..., Wn) = ACG, .--5 Gn). 


Definition 6.4.3 The discriminant dx ofan algebraic number field K is the common 
value of the discriminants of all integral bases of its ring of integers Ox. 


For some later work in this section we need the following result whose proof we 
will give in Section 6.5 after we introduce some material on ideals. 


Theorem 6.4.6 If K has degree n over Q then each ideal I C Ox has an integral 
basis of rank n. That is there exists wW,,...,W, € I such that any a € I can be 
expressed uniquely as 

A= Mw +---+MyWy 


with m; € Z. In particular any ideal in I is finitely generated of rank < n. 


In particular this implies that the index [Ox : /] is finite. Then for an ideal J in 
Ox we define the discriminant d(/) of J analogously via an integral basis of J. This 
certainly exists and the value d(/) is independent of the chosen integral basis of J. 
Since the index [Ox : I] is finite we have d(1) = [Ox : I1°dx. 


6.4.3 Quadratic Fields and Quadratic Integers 


We now look more closely at quadratic fields. These are algebraic number fields K 
of degree 2. The Gaussian rationals Q(i) are an example. Let K = Q(6) with |K : 
Q| = 2. Then @ satisfies a degree 2 integral polynomial p(x) = ax” + bx + c. Let 
d = b? — 4ac be the discriminant of this polynomial. Then clearly Q(/d) c Q(6) 
and hence if d is not a perfect square it follows by degrees that Q(./d) = Q(@). 
Further if d = md, then Q(./d) = Q(/d)). It follows from these comments that 
any quadratic field K has the form Q(/d) for some squarefree integer d. In the 
following we always consider d to be squarefree. If d > 0 then K is called a real 
quadratic field while if d <0 it is an imaginary quadratic field. In both cases 
{1, Vd} is a basis for K over Q. 
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The integers in Q(/d) are called quadratic integers and we characterize them. 
Suppose a € Ox is a quadratic integer. Since a € K we have a = qi + q2Vd. Since 
irr (a, Q) is a monic rational integral polynomial of degree 2 we have 


irr(a, Q) = (x —a)(x —@) = x* — (a+ @)x + aa € Z[x], 


where @ = qi — q2 Jd. It follows that a € O x if and only if its trace and norm are 
both rational integers: 
tre(a) =at+a=2q,E€Z 


Nx(a) = 0@ = qj — dq; €Z 


since d is squarefree. 
Now 
(2q2)°d = (21) — 4(q} — god) €Z = > 2M €Z. 


Therefore gq; = 5, 92 = 5 for rational integers m, n and 


d 
a= mt avd with mn eZ. 


Further 
m — n?d = 0 mod 4. 


If d = 2 mod 4 or d = 3 mod 4 this congruence is solved only if m,n are even or 
equivalently gi, qo € Z. 

If d = 1 mod 4 then m? — dn? = 0 mod 4 is equivalent to m = n mod 2. 

It follows that the integers in Ox can be described by: 
(1) m+nvVd with m,n € Z. 
(2) If d = 1 mod 4 but not otherwise, also mbna with m, n odd rational integers. 

From this characterization it follows that if d is not congruent to 1 mod 4, every 
integer in Ox can be written as m + n/d with m,n € Z. In other words {1, Jd} iS 
an integral basis. 

If d=1 mod 4 letw = 1a Then from the characterization every integer in 
Ox is uniquely of the form m+ nw,m,n € Z and {1, w} is an integral basis (see 
exercises). We summarize all this discussion in the next theorem. 


Theorem 6.4.7 Let K be a quadratic field. Then: 
(1) K = Q(Vd) for some squarefree rational integer d. 


(2) The integers in K can be characterized as 
(a) m+ n/d with m,n € Z; 
(b) Ifd = 1 mod4 but not otherwise, also mtnvd with m, n odd rational integers. 


(3) An integral basis for Ox is given by 
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(a) {1, Vd} ifd = 2 mod 4 ord =3 mod 4; 
(b) {1,w}, where w = +4 itd = 1 mod 4, 


(4) The discriminant of K = Q(J/d) is 


(a) 4d ifd = 2,3 mod 4, 
(b) difd =1 mod 4. 


Proof Everything was explained prior to the theorem except part (4). If d = 2,3 
mod 4 then {1, vd } is an integral basis. Then 


1 Jd 
A(l, Vd) = = 4d. 
(1, Vad) va 
If d = | mod 4 then {1, w} is an integral basis and 
1 14+Jéd 
A(,w) = piece =d. 
2 


Theorem 6.4.8 Suppose that K =Q(J/d) with d <0 and d squarefree is a 
quadratic imaginary number field. If d 4 —1, —3 then the only units in Ox are 


+1. If d = —1 the units are +1, +i while if d = —3 the units are £1, tw, +W, 
Ltiv3 
—. 


where w = 


Proof As we have seen a € Ox is a unit if and only if |N(a)| = 1. Let a be a unit 
in Ox. Thena=x+yV/d ora= ep and then N(a) = x? — dy* or N(a) = 
x?—dy? 

4 = . 

Since d < 0, x? — dy” > 0. If d < —1 and d is not congruent to 1 mod 4 the 
only solutions to x? — dy = lisx = +1, y =0. 


Our analysis of the Gaussian integers showed that if d = —1 then +i are also 
units. 

If d < —3 then the only solutions to x* — dy? = 4 are x = +2 again giving the 
result. 

Finally if d = —3 we see by computation that tw and +W are also units (see 


exercises and note that w? = 1). 


Theorem 6.4.9 Jn any real quadratic field there are infinitely many units. 


Proof The equation x* — dy? = 1ford > Oandx, y € Ziscalled Pell’s equation. If 
d > 1 in Section 6.4.6 we will show that this equation has infinitely many solutions. 
Since a = x + yVd is an integer in Ox with N(a) = 1 it follows that Ox has 
infinitely many units. 
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In the real quadratic case the units can be built up from one special unit called a 
fundamental unit. 


Theorem 6.4.10 Suppose K = Q(V/d) withd > Oand squarefree, Then in Ox there 
exists a special unit, €g, called the fundamental unit such that all units in Ox are 
given by 


po = +e), n=0,+1,42,---+4--- 


This is a special case of a general result called Dirichlet’s unit theorem which we 
will present in Section 6.4.6. 

Now what can be said about primes and prime factorization for quadratic integers? 
We saw in Section 6.4.2 that there is always a prime factorization. However our 
example in Q(./—5) shows that this is not always unique. Since there is a norm in 
every Ox the first question to ask is when this is a Euclidean norm or equivalently 
which Ox are Euclidean domains. From the results in Section 6.2 this would imply 
unique factorization. We have already seen that the Gaussian integers are Euclidean. 
We state several results concerning these questions (see [Ri]). 


Theorem 6.4.11 Suppose K =Q(J/d) with d <0 and _ squarefree is a 
quadratic imaginary number field. Then Ox is Euclidean if and only if 


We let Ou stand for Ox when K = Q(V/d). The rings 


O_|, O_-2, O-3, O_-7, O-11 


are called the Euclidean quadratic imaginary number rings. They and matrix 
groups with entries from them have been investigated extensively (see [F] 
and [FR 1]). 

In the real case we have the following. 


Theorem 6.4.12 The real quadratic fields K = Q(Vd) for which Ox is Euclidean 
are for 


d = 2,3,5,6,7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57, 73. 


Recall from Section 6.2.3 that being a principal ideal domain always implies 
unique factorization. It was conjectured by Gauss and finally proven in several results 
by Heegner, Baker, and Stark that there are only finitely many quadratic imaginary 
number fields whose integer rings are principal ideal domains. 


Theorem 6.4.13 Suppose K = Q(/d) with d < 0 is a quadratic imaginary num- 
ber field. Then Ox is a principal ideal domain if and only if 


d = —1, —2, —3, —7, —11, —19, —43, —67, —163. 
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It has been conjectured that there are infinitely many real quadratic fields whose 
integral rings are principal ideal domains. 

In the case where Ox does have unique factorization we can analyze the primes 
exactly as we analyzed the Gaussian primes in Theorem 6.2.6. We state the following 
and leave the proof to the exercises. 


Theorem 6.4.14 Suppose K is a quadratic field and suppose Ox is a unique fac- 
torization domain. Then 

(1) To each prime 7 € Ox there corresponds one and only one rational prime p 
such that 7| p. 

(2) Any rational prime p is either a prime in Ox or a product 17 of two primes 
(not necessarily distinct) from Ox. In this case ifm, 4 72, we say p is decomposed. 
If t| = 7, so that p = 7, we say the rational prime is ramified. 

(3) All primes in Ox are either rational primes or the two factors of rational 
primes (and their associates). 


6.4.4 The Transcendence of e and x 


There are infinitely many transcendental numbers (see Section 6.3.2) however the 
only particular number that we have exhibited as transcendental was 


Here we show that the fundamental constants e and 7 are also transcendental. The 
transcendence of e was established first by Hermite in 1873 while Lindemann in 
1881 proved the transcendence of 7. 


Theorem 6.4.15 e is a transcendental number, that is, transcendental over Q. 


Proof We use some complex analysis. Let f(x) € R[x] with the degree of f(x) = 
m > 1. Let z,} € C,z; #0, and y: [0, 1] > C, y(t) = tz. Let 
H(z.) = / e'* f(a)dz = ( | ye f(Odz. 
7 
By ( is '), we mean the integral from 0 to z; along y. Recall that 
f ne fade = - fe +e £0) + (fhe F Wade 


It follows then by repeated partial integration that 


(1) F(z1) =e Og FPO) — Dg FM). 
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Let | f |(x) be the polynomial that we get if we replace the coefficients of f (x) by 
their absolute values. Since |e*!~*| < e!*!~*! < el*!!, we get 


(2) T(z S late! | F(z). 


Now assume that e is an algebraic number, that is, 


(3) got qie+--:+ ane" = 0 for some n > | and integers go # 0, q1,.--, dns 
and the greatest common divisor of go, 1, ---, Gn, iS equal to 1. 


We consider now the polynomial f(x) = x?~!(x — 1)?--- (x —n)? with p asuf- 
ficiently large prime number, and we consider / (z,) with respect to this polynomial. 
Let 

J =qolO0)+q10)+-:-+qnl(n). 


From (1) and (3) we get that 


m n 


122) Dar, 


j=0 k=0 
where m = (n + 1)p — I since (qo + qie + +++ + qne")(Xi-9 FY (0)) = 0. 

Now, f‘(k) = 0 if j < p,k > 0, and if j < p—1 then also for k = 0, and 
hence f‘/)(k) is an integer that is divisible by p! for all j,k except for j = p — 1, 
k=O. Further, f?~? (0)= (p — 1)!(—1)"? (n!)’, and hence, if p > n, then f?~! (0) 
is an integer divisible by (p — 1)! but not by p!. 

It follows that J is a nonzero integer that is divisible by (p — 1)! if p > |qo| and 
p>n.Solet p >n, p > |qol, so that |J| > (p — 1)!. 

Now, | f|(k) < (2n)”. Together with (2) we then get that 


lJ1 <= lqilel flG) +--+ + lanlne" | f(a) < c? 
for a number c independent of n. It follows that 
(p—1)is lJ] se’, 


that is, 
[J | ceo} 
l< <c . 
(p= 1)! (p=)! 


cpa} 
—1 


This gives a contradiction, since . 


dental. 


co 0 as p — oo. Therefore, e is transcen- 


We now move on to the transcendence of 7. Recall first from the proof of Theorem 
6.4.1 that if a € C is an algebraic number and 
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FQ) Sa" a ag nS le, 40, 
and all a; € Z with f(a) = 0. then a, qa is an algebraic integer. 


Theorem 6.4.16 7a is a transcendental number, that is, transcendental over Q. 


Proof Assume that 7 is an algebraic number. Then 6 = i7 is also algebraic. Let 
0, = 0, 62,..., Oy be the conjugates of 6. Suppose 


P(x) =qo+qixt+-+-+qax4 € Z[x], qa > 0, and ged(qo, ..-, qa) = 1 


is the entire minimal polynomial of 6 over Q. Then 0, = 0, 62,..., 04 are the zeros 
of this polynomial. Let t = gg. Then from the discussion above t6; is an algebraic 
integer for all i. From e'” + 1 = 0 and from 6; = iz we get that 


(+e"1+e”)---d+e%) =0. 


The product on the left side can be written as a sum of 24 terms e®, where 
p= €0, +--+ + €gOq, €; = 0 or 1. Let n be the number of terms €,; +--+ + €gOa 
that are nonzero. Call these a), ..., a@,. We then have an equation 


qt¢e™ ease sp em =0 (6.4.1) 


with g = 24 —n > 0. Recall that all ta; are algebraic integers. We consider the 
polynomial 
f(x) = 0? xP" — 04)? ++ (@ — 9)? 


with p a sufficiently large prime integer. We have f(x) € R[x], since the a; are 
algebraic numbers and the elementary symmetric polynomials in q),..., a, are 
rational numbers. 

Let I (z;) be defined as in the proof of Theorem 6.4.15, and now let 


J = Ia) +--+ + 1(Gn). 


From (1) in the proof of Theorem 6.4.15 and (6.4.1) we get 


m m 


J=-4 i f° O-V DV FP ew, 
j=0 


j=0 k=1 


withm = (n+ 1)p—1. 

Now, >;_, f(a) is a symmetric polynomial in ta;,..., fa, with integer 
coefficients since the fa; are algebraic integers. It follows from the main theorem on 
symmetric polynomials that 3779 >’;-1 fY? (ax) is an integer. Further, f (ax) = 
O for j < p. Hence 377 S_, Ff (ax) is an integer divisible by p!. 
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Now, f‘/(0) is an integer divisible by p! if j 4 p—1, and f¥~?(0) = (p— 
1)!(—t)"” (a, +++ Q,)? is an integer divisible by (p — 1)! but not divisible by p! if p 
is sufficiently large. In particular, this is true if p > |t"(a ,---a@,)| and also p > q. 

From (2) in the proof of Theorem 6.4.15 we get that 


IJ] < lagle!" | faa) +--+ + lanlel"'|fl(an|) < c? 


for some number c independent of n. 
As in the proof of Theorem 6.4.15, this gives us 


(p= Tie | se, 


that is, 
| | cP! 
l< <c : 
(p= 1! (p— 1)! 


: ‘ ee : pol 
This as before gives a contradiction, since Gao 


is transcendental. 


— 0 as p > o. Therefore, 7 


6.4.5 The Geometry of Numbers—Minkowski Theory 


We consider some ties between algebraic integers and the geometry of real n-space. 


Definition 6.4.4 Let V be an n-dimensional vector space over the real numbers R. 
A lattice in V is a subgroup of the form 


P= {myv; +--+ + mug; m; € Z} 


with v1, ..., Ux linearly independent vectors of V. 
The k-tuple {v,, ..., vg} is called a basis and the set 


b= {xyvy +--+ xs x EC R,O< x, < 


is a fundamental mesh of the lattice. 
The lattice is complete if k = n. 


As an example consider the lattice given by the Gaussian integers in real 2-space. 
Here V = R?, Fr = Z+ Zi = Z[i] and the fundamental mesh is 


o={x+iy;0<x <10<y <]}. 


Now suppose V is a real Euclidean space, that is a finite-dimensional R-vector 
space with an inner product, that is a symmetric, positive definite bilinear form 
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On sucha V we can define a volume. The cube spanned by the standard orthonormal 
basis e;,..., @, has volume | and more generally the parallelopiped 


= {x10} He Xn Uns Xj € R,O< x; < 1} 
spanned by the independent set of vectors v;,..., v, has a volume given by 
vol(@) = | det(A)|, 


where A = (q;;) is the transition matrix from the basis e,...,@, to the basis 
U1, ...+, Up, that is 
n 
Up = > aijej- 
i=l 


As an example if we use the ordinary Euclidean inner product on R” then 


vol(¢) = X(9), 


where J is the Lebesgue measure. 
1 
Further we have vol(@) = | det( < v;, vj >)|? since 


2%, >= > Aga Sey, ec; > = > Aina jg = AA’ 


kl k 
Let I be the lattice spanned by v),..., v,. If ¢ is the fundamental mesh then we 
define 
vol(T.) = vol(¢). 
This definition is independent of the choice of basis vj, ..., v, for the lattice because 


the transition matrix to another basis for the lattice is from GL(n, Z). 
Now let K be an algebraic number field with |K : Q| =n. Then there are n 


different embeddings of K into C which fix Q. Call these 7), .. . , 7,. From these some 
are real and some are non-real. Let p;,..., p, be the real embeddings K — R. The 
non-real complex embeddings K — C are given in pairs 01, 01, ..., 05, 0s, where 


o; is the complex conjugate of the mapping o;. Altogether we have n = r + 2s. 
For each pair o;, 0; we choose a fixed non-real embedding and call this just 0;. 
We define fora € K the map f : K —> R” by 


f(a) = (pi(a), ..., pr(a), Re (o1(a)), ..., Re (as(a)), Im (1 (a)), ..., Im (a5 (a))). 
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Further we define 
< a,b >= >° pila)pi(b) +2 >. Re (oj (a))Re (9 (b)) + 2 >| Im (o;(a))Im (o;(b)). 
i=l i=1 i=1 


We may extend this to an inner product on R’*+*. For the following we consider the 
metric defined by this inner product. 


Theorem 6.4.17 If 1 4 0 is an ideal in Ox thenT = f (1) is a complete lattice in 
R’+?5 with 


vol (T) = Vldx|[Ox : 1], 
where dx is the discriminant of K 
Proof Let aj,..., Q@, be an integral basis for J such that 
P =Z flay) +--+ Zf (an). 
We number the embeddings 7 : K — C via 7, ..., 7, and consider the matrix A = 
(7) (a;)). Then 
d(1) = (det(A))” = [Ox : 1?dx 


and 
vol(T) = | det(< f(ai), f(aj) >)|? = | det(A)]. 


In the Minkowski theory we consider in R” the parallelopipeds 
X= [X15 cs Xp Utes Uys V4 <5 Osh |X| HG, 1 = Len Ff 
<dj,j=1,...,s} 
with c;, dj; > 0. 
Using Minkowski’s theorem on the existence of lattice points in this type of subsets 


of R” (see [Co]) and an analytic evaluation with respect to the above metric we get 
the following. 


Theorem 6.4.18 If dx is the discriminant of Ox then 


n” Ton 
Vidkl = mta 


As a direct consequence we have the result of Minkowski. 
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Theorem 6.4.19 (Minkowski) If K 4 Q then |dx| 4 1. 
A refinement of the analytic evaluation leads to a result of Hermite. 


Theorem 6.4.20 Jf D > 0 is constant then there are only finitely many algebraic 
number fields with |dx| < D. 


For detailed proofs of Theorems 6.4.18, 6.4.19 and 6.4.20 see [Ne]. 


6.4.6 Dirichlet’s Unit Theorem 


We mentioned when discussing real quadratic fields that each unit is up to +1 a 
power of a fundamental unit. This is a special case of the theorem below called the 
Dirichlet unit theorem. We state it in general and then give a proof for the quadratic 
case. 


Theorem 6.4.21 (Dirichlet unit theorem) The group of units U(Ox) of Ox is the 
direct product of the finite cyclic group U(K) of roots of unity which are contained 
in K and a free abelian group of rank r + s — 1, whereas in the last section r is the 
number of real embeddings K — Rand s is the number of pairs of complex non-real 
embeddings K — C. 

Equivalently there exist units €,,..., €, in U(Ox) witht =r +s — 1 called fun- 
damental units such that each unit u € U(Ox) is a product 


u=Ce+-ef 
with 4, € Z and ¢ is a root of unity contained in K. 


We prove only the case for quadratic fields K = Q(./d) with d squarefree. For a 
proof in the general case see [Ne]. We have already considered the units in quadratic 
imaginary number fields (Theorem 6.4.8) The structure of the unit groups (see [Co]) 
can be given by 

(1) Ifd = —1 then U(Ox) = {+1, +7}. This is cyclic of order 4. 

(2) Ifd = —3 then U(Ox) = {+1, tw, +0}. This is cyclic of order 6 (see exer- 
cises). 

(3) Ifd # —1, —3 andd < 0 squarefree then U(Ox) = {—1, 1} which is cyclic 
of order 2. 

For the remainder of this section we assume that d is a positive squarefree integer. 
As explained in the proof of Theorem 6.4.9, for real quadratic fields we must consider 
solutions of Pell’s equation x” — dy? = 1. We will show that there are infinitely many 
solutions. First we need some technical results. 


Lemma 6.4.8 /f¢ is an irrational real number then there are infinitely many rational 
numbers 7 with (x,y) = land|*—(¢| < <r. 


346 6 Primes and Algebraic Number Theory 


Proof Consider the partition of the half-open interval [0, 1) by 


1 1 2 n—-1 
[0, 1) = [0, -)U[-, -)U---U[ 
n nn 


: 1. 


If a € R then the fractional part of a is a — [a], whereas usual [ ] is the greatest 
integer function. The fractional part of any irrational number lies in a unique member 
of the above partition. 

Consider the fractional parts of 0, ¢, 2¢,...,¢. At least two of these must lie in 
the same subinterval. Hence there must exist j,k with j > k,0 < j <n such that 


1 
lv — [76] — ee — [ACD < —. 


Put y = j —k,x = [k¢] — [jC] so that |x — y¢| < 4. We may assume that (x, y) = 


n 


1 for dividing by (x, y) only strengthens the inequality. Further 0 < y <n implies 
that 


To obtain infinitely many solutions note that le —¢| 4 0 and then choose any 
H ) 
—d" 


integer m > iz The above procedure then gives the existence of integers x, y 
t 


such that 


Xx 
| Cl < <I-—-4| 
y 


and 0 < y <m. Continuing like this then leads to an infinite number of 
solutions. 


Lemma 6.4.9 There is a constant M = M(d) such that |x? — dy?| < M has infi- 
nitely many integral solutions. 


Proof Write x? — dy? = (x + /dy)(x — Vdy). From Lemma 6.4.8 there exist infi- 


nitely many pairs of relatively prime integers (x, y), y > 0, satisfying |x — /dy| < 
1. It follows that 


1 
|x + Vdy| < |x —Vdy|+2Vdy < —+2Vdy. 
y 


Then i i 
jx? —dy?| <|— + 20/dy|— = 2d + 1, 
y y 


Theorem 6.4.22 (Pell’s equation) x? — dy* = 1 has infinitely many integral solu- 
tions. Further there is a particular solution (x,, y,) such that every solution has the 


form £(Xn, Yn), Where X_ + ynVd = (x1 + yiVd)" for n € Z. 
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Proof From Lemma 6.4.9 there is a positive integer m such that x? — dy* = m for 
infinitely many integral pairs (x, y) with x > 0, y > 0. We may assume that the x 
components are distinct. Further since there are only finitely many residue classes 
modulo |m| one can find pairs (x1, yi), (x2, y2) such that x; A x2 and x; = x2 mod 
|m| and y; = y2 mod |m|. 

Leta = x; — yivd, B=xX- yoVd. If y =x- yVd let7 =x- yd the con- 
jugate of y and N(y) = x? — dy’ the norm of 4. 

Then af = A+ Bd with m|A and m|B. Thus a3 = m(u+ vd) for some 
integers u, v. Taking norms on both sides yields 


m = m?(u- - vd) = w—-vd=1. 


It remains to show that v 4 0. 

If v = 0 then u = +1 and then af = +m. Multiplying by 3 gives am = +m 
or a = +(. But this implies x; = x2 a contradiction. Therefore there is a solution to 
Pell’s equation with xy 4 0. 

We now prove the second assertion. We say that a solution (x, y) is greater than 
a solution (u, v) if x + yVd > u +v/d. Now consider the smallest solution a = 
x + yJ/d with x > 0, y > 0. Such a solution clearly exists and is unique. It is called 
a fundamental solution. Consider any solution 6 = u + vid with u > 0,v > 0. 
We show that there is a positive integer n such that 3 = a”. 

Suppose not. Then choose n > 0 such that a” < 3 <a"t!. Then < (@)"6 < 
a since @ = a~!. However if (@)"G = A+ Bd then (A, B) is a solution to Pell’s 
equation and 1 < A+ Bd <a. 

Now 4+ B/d>0 so A—-BYd= (A+ BVd)7! > 0. Hence A > 0. Also 
A— Bd =(A+ BVd)~! <1 and hence BVd > A—1> 0. Thus B > 0. This 
contradicts the minimality of a. If @ = a+ bVd is a solution with a > 0,b <0 
then —! = a — b/d = a" by the above argument so 3 = a~”. 

The casesa <0,b>0Oanda <0,b <0 lead to —a" for n € Z. This proves 
the theorem. 


We can now prove Dirichlet’s Unit Theorem for real quadratic fields. 


Theorem 6.4.23 Let K = Q(/d) with d > 0 and squarefree be a real quadratic 
field. Then there exists a unit €9 € Ox such that every unit in Ox is of the form +e} 
forn € Z. It follows that U(Ox) = Z2 x Z the direct product of Z and Za. 


Proof From Theorem 6.4.22 there exist positive nonzero integers x, y such that x7 — 
dy* = 1. Thuse = x + yVd isaunit in Ox withe > 1. Let M bea fixed real number 
greater than . There are at most finitely many a € Ox, a= p+ qv4d, p,q, €Q 
with |a| < M and also |a| < M. This is clear since there are only finitely many 
integers k with |k| < M. 

Let 3 beaunitwith1 < @ < M.Sucha (exists since M > €. Then N(3)N (2) = 


+1.1f 8 = : then—M < 5 < M andif 3 = 4 thenalso—M <4 < M.Thus 
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there are only finitely many units G with 1 < @ < M and of course there is at least 
one €. 

Let € be the smallest positive unit greater than 1. If 2 is any positive unit then 
there is a unique integer s with 6 < B < &*!. Then 1 < Beq* < e. Since Be7° is 
also a unit we must have Ge~* = 1. If 6 < 0 then —{ is positive and —3 = e9 for 
some s € Z, completing the proof. 


If d = 2 the fundamental unit is 7 = 1 + ./2 and for d = 5 a fundamental unit 
is $(1 + “/5) (see exercises). However even for small discriminants, computation 
of the fundamental unit can be quite difficult. For example the fundamental unit for 
d = 34 is 35 + 6/34. 


6.5 The Theory of Ideals 


In analyzing the proofs of unique factorization, the uniqueness part, whether in Z, 
a general Euclidean domain or a principal ideal domain, hinged on the respective 
analog of Euclid’s Lemma. That is, if p is a prime and p|ab then p|a or p|b. In these 
cases this lemma depended on the fact that the principal ideal < p > generated by 
a prime p was both a prime ideal and a maximal ideal. For the algebraic number 
rings Ox we have seen that there are always prime factorizations (Theorem 6.4.1) 
but these are not always unique. Hence Euclid’s Lemma cannot hold in general. The 
problem is that the principal ideal generated by a prime 7 € Ox need not be a prime 
ideal. Kummer addressed this problem by adjoining to Ox ideal numbers which 
generated prime ideals. He could recover unique factorization but the components 
of the factorization did not always lie in the ring Ox. Dedekind took a different 
approach. Rather than work with factorizations of the elements of Ox he worked with 
ideals in Ox. He was then able to show that for all Ox there is unique factorization 
of ideals into prime ideals. Further as consequences of this factorization many results 
in elementary number theory such as Fermat’s theorem and the Chinese Remainder 
Theorem can be recovered, albeit in terms of ideals. 

Since each algebraic number ring Ox is an integral domain we can apply the 
material on ideals introduced in Section 6.2. Recall that an ideal J in Ox is asubring of 
Ox such that AJ C J forall\A € Ox. Equivalently J C Ox isanidealifAa+76 € I 
whenever a, 3 € J andA-7 € Ox. If ay,..., a, € Ox then the set 


<Qy,...,a% > = {Ajay +--+ + Agags AG € Or} 


forms an ideal called the ideal generated by aj,...,a,. If k is finite then 
< aj,..., ag > is finitely generated. The ideal < a > is the principal ideal gen- 
erated by a. An ideal / is a prime ideal if whenever af € J then either a € J or 
(G € I. 1 is a maximal ideal if whenever a ¢ J then < a, ] > = Ox. 
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First we show that every ideal J C Ox has an integral basis and hence is finitely 
generated. This fact follows directly from the fact that Ox is a finitely generated free 
Z-module and results on submodules of such modules or more simply from the basis 
theorem for finitely generated abelian groups (see Chapter 2 or [CFR]). However we 
give a direct proof mimicking the existence of an integral basis for all of Ox. 


Theorem 6.5.1 Jf K has degree n over Q then each ideal I C Ox has an integral 
basis of rank n. That is there exists w,...,W, € I such that any a € I can be 
expressed uniquely as 

A= Mw +--+ + MyWp 


with m; € Z. In particular any ideal in I is finitely generated of rank < n. 


Proof Suppose A C Ox C K is anonzero ideal and suppose |K : Q| = n. If A has 
an integral basis w),..., wz then these are linearly independent (as elements of K) 
over Q. Since the dimension of K over Q is n it follows that k < n. Suppose then 
that G),..., 9, are integers in Ox which form a basis for K over Q. In the proof 
of Theorem 6.4.4 it was shown that K has such a basis. If a € A with a ~ 0 then 
a,,..., @9, are allin A since A is an ideal and are linearly independent. However 
since they are in A they can be linearly expressed in terms of w1,..., wx which is 
impossible if k <n. Therefore if A has an integral basis then it must have n elements 
in it. 

The proof that A does indeed have an integral basis is almost identical to the proof 
of Theorem 6.4.4. Consider all sets w),..., Ww, in A which are linearly independent 
over Q. The set af;,..., a, is an example. For each such set the discriminant 
A(w},..., Wy) is then a nonzero rational integer. Therefore we can choose a set 
W1,...+, Wy for which the discriminant is minimal. This is an integral basis for A the 
detail identical to those in Theorem 6.4.4 (see exercises). 


The fact that each ideal in Ox has bounded rank implies immediately that each Ox 
is Noetherian. That is each ring of algebraic integers satisfies the ascending chain 
condition on ideals. Hence each ascending chain of ideals in any Ox eventually 
becomes stationary (see Section 6.2.3). 

Clearly two ideals A =< qQj,...,Qm >, B=< (j,..., Oy > are the same if 
each q; is an integral linear combination of the 3; and each /3; is an integral linear 
combination of the a;. From this we obtain. 


Lemma 6.5.1 Ifa, 3 4 Othen < a >=< (3 >ifandonly ifaand G are associates. 


Crucial to unique factorization in Z and in Euclidean domains in general was that 
each prime ideal is maximal. This is true in all Ox. 


Theorem 6.5.2. An ideal I C Ox with I 4 < 0 > is a prime ideal if and only if it 
is a maximal ideal. 
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Proof Suppose P = < w},..., Ws > is a maximal ideal in Ox. We show that P is 
also a prime ideal. Suppose a € P and suppose that a ¢ P we must show that 
Be P. Let P'’=<uw),...,w,,a >. Since {w),..., ws} C P’ it follows that P C 


P’. Since P is maximal either P’ = P or P’= Ox. If P= P’ thenaeée P’=P 
contradicting the assumption that a ¢ P. Therefore P'’ = Ox and hence | € P’. It 
follows that 

1=ayu, +--+ + AsWs + A541 


with a1,..., Qs, M41 € Ox. Multiplying through by @ yields 


B= (Bay)wy +--+ + (Bas)ws + as4108. 


Since w,,...,wW, € Panda( € P and P is an ideal, it follows that G € P. Therefore 
P is a prime ideal. 

Conversely suppose P is a prime ideal. We show that it is maximal. Recall that if 
R is a commutative ring and J is an ideal then J is maximal if and only if R/J isa 
field (see Section 6.2). If ~w € 0 is an element of P then its norm N(q) is also in P. 
Since the norm is a rational integer it follows that P 1 Z 4 < 0 >. Since P is a prime 
ideal then P MN Zis a nonzero prime ideal in Z. Hence PN Z = pZ for some rational 
prime p. Then Z/pZ = Z, a finite field. Now the quotient ring Ox /P is formed 
by adjoining algebraic elements to the finite field k = Z/pZ. However adjoining 
algebraic elements to a field forms a field. Therefore the quotient ring Ox /P forms 
a field and therefore P is a maximal ideal. 


6.5.1 Unique Factorization of Ideals 


We now introduce a product on the set of ideals of Ox. Relative to this product we 
will show that there is unique factorization in terms of prime ideals. 


Definition 6.5.1 If A=<q),...,Qm >,B=< (),..., G4 > ave ideals in Ox 
then their product 


AB =< a1}, 0192, ...., Bj, ---, Ame > 


is the ideal generated by all products of the generating elements. 


It is a simple exercise to show that this definition is independent of the generating 
systems chosen. 

Now we say that A divides B denoted A|B if there exists an ideal C such that 
B = AC. A is then called a factor of B. A is a divisor of B if B C A. Finally A is 
an irreducible ideal if the only factors of A are A and < 1 > = Ox. 

The concepts of factor and divisor will turn out to be equivalent but we will prove 
the main theorem before proving this. We would like to use the irreducible ideals in 
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the role of primes. However for the time being we will not call them prime ideals 
reserving that term for the previous definition. However we will eventually prove 
that an ideal J C Ox is irreducible if and only if it is a prime ideal. Therefore as 
in the case of rational integers, for ideals, the terms prime and irreducible will be 
interchangeable. 

First we show that a factor is a divisor. 


Lemma 6.5.2 If A|B then B C A, that is, a factor is a divisor. 


Proof Suppose B = AC so that A|B. Let 
A=<aq,...,@;>,B=< (),...,8,;>, C=<,---,W>- 


Then 
S Gigs seg Dp > SX OY pai OG Vjsnes Osa > 


Therefore for each k = 1,....f, 


By = > 45007; with 6;; € Ox. 


ij 


This implies that 


= Yo. 93,7 Vj )Qi- 
i J 


Hence each /3; is an integral (from Ox ) linear combination of the a; and thus (; € A. 
Therefore B C A. 


To arrive at the prime factorization we need certain finiteness conditions. 


Lemma 6.5.3 A rational integer m 4 0 belongs to at most finitely many ideals in 
Ox. 


Proof Suppose m is a rational integer and m € A, where A is an ideal in Ox. Since 
both +m € A we may assume that m > 0. Let w),...,w, be an integral basis for 
K.If A=<ayj,..., a; > then each qa; may be written as 


n 
Qi, = > CijWj, 
i=l 


where the c;; are rational integers. Then for each j = 1,...,n 


Cij = Gijm + rij, 0 SVij <m. 
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Then 
Qi = SGim + rij)wj =m > Hi; Tr ri; = my + Gi, 


where 7; and (3; are integers and (; can take on only finitely many values since 
rjj_< m. Now since m € A we have 


A=<aqQj,...,Q; >=<Qj,...,Q,m>=<my+,...,Mys+ Bs >. 
However since m € A it follows that my; € A for all i and thus 


A =< Biysery Be > 


Since there are only finitely many choices for each (3; there are only finitely many 
choices for A. 


Lemma 6.5.4 An ideal A 4 < 0 > has only a finite number of divisors and hence 
only a finite number of factors. 


Proof Let A be anideal with A 4 < 0 >. Ifa € Awitha ¢ O then the norm N(q) € 
A. Since a is an algebraic integer N(a) € Z. It follows that AN Z ¥ {0}. But then 
N(q) can belong to only finitely many ideals and A can have only finitely many 
divisors. Since each factor is a divisor, A has only finitely many factors. 


We now state the main result. 


Theorem 6.5.3. (Unique Factorization of Ideals) Every ideal I C Ox with I #< 
0 >andI # < 1 > canbe factored into a product of prime ideals. This factorization 
is unique except for the ordering of the factors. 


The proof is broken into several steps. First we introduce some further general 
ideas from algebra. 


Definition 6.5.2 If R is a commutative ring with an identity then a module over R 
or an R-module is an abelian group M which allows scalar multiplication from R 
Satisfying 

rvueMifreR,vemMm. 

ru+v)=ru+rvforre R,u,veM. 

(r+s)v=rv+svuforr,s e R,ve M. 

(rs)v =r(sv) forr,s Ee R,ve M. 

lu=vforve M. 


MKRWNS 


Therefore we can think of a module as a vector space where the set of scalars is 
just a commutative ring rather than a field. Clearly any abelian group is a Z-module. 
A subset {m;} of elements of M generates M if every element of M is a finite 
R-linear combination of finitely many elements from {m;}. If a set of generators is 
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finite then M is a finitely generated module over R. If M is a module then an R-basis 
for M is a generating set which is linearly independent over R. Not every R-module 
has an R-basis. An R-module which has an R-basis is called a free R-module. A 
submodule N is a subgroup of M which is also a module. The following is important 
for our further work. For a proof see [CFR]. 


Theorem 6.5.4 Let R be a principal ideal domain and M a free R-module. If 
m,,...,Ms is a finite R-basis and N is a nonzero submodule of M then N is also 
free and has a finite basis with < s elements. 


Since each abelian group is a Z-module and Z is a principal ideal domain if we 
apply this theorem to abelian groups we get the basis theorem for finitely generated 
abelian groups. 

Now we return to the proof of the main theorem. To obtain the existence of a 
unique factorization we extend the definition of an ideal. 


Definition 6.5.3 A fractional ideal in K is a nonzero finitely generated Ox- 
submodule of K. That is I C K is a fractional ideal if I is an additive subgroup 
of K closed under multiplication from Ox. 

An ordinary ideal A C Ox is then also a fractional ideal. In this context we call 
an ordinary ideal an integral ideal. 


Notice that fractional ideals can be multiplied in the same manner as ordinary 
ideals to obtain other fractional ideals. We next define an addition of fractional 
ideals. 


Definition 6.5.4 Jf A and B are fractional ideals then the sum is defined by 
A+B={at+8;a€A,8 € B} 


The sum of fractional ideals is again a fractional ideal (see exercises). 
Lemma 6.5.5 Every integral ideal contains a product of prime ideals. 


Proof Let S consists of the set of integral ideals for which this statement is false. 
If S is nonempty, since Ox satisfies the ACC on ideals (is Noetherian), it follows 
that S must have a maximal element A. Therefore A is an integral ideal which is 
not prime and for which any ideal properly containing A must contain a product of 
prime ideals. Since A is not a prime ideal there must exist elements a, 3 both not in A 
but with a8 € A. Then A; = < A,a > and B; =< A, @ > both properly contain 
A and hence both contain a product of primes ideals. Then A,B, also contains a 
product of prime ideals. But 


A,B; Cc AA+aA+GA+ <aB>CcaA 
since a3 € A. But then A contains a product of prime ideals which is a contradiction. 


Therefore the set S must be empty and hence every integral ideal contains a product 
of prime ideals. 
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We also need the following which gives an inverse under this multiplication for 
ordinary ideals. 


Definition 6.5.5 For an integral ideal A C Ox we define 
A! ={ae K;aA € Ox} 


Lemma 6.5.6 For A C Ox, an integral ideal, the set A~' is a fractional ideal and 
Ox C A!. Further if A is a proper ideal then A“! properly contains Ox. 


Proof We leave the proof that A~! is again a fractional ideal to the exercises and 
prove that if A is a proper ideal then A~! properly contains Ox. We must show that 
there is an element of A~! which is not an algebraic integer. Choose an a € A with 
a # 0. From Lemma 6.5.4 there is a set of prime ideals P;,..., P, satisfying 


Pi--» Py C<a>CaA. 
Choose such a set of prime ideals with minimal possible s. Since A 4 Ox by the 
Noetherian property it follows that A must be contained in some maximal (and hence 
prime) ideal P. Therefore we have 
P,--- Ps. CP. 

If P ~A P; for all i =1,...,5 then there is an a; € P; with a; ¢ P and with 
a ,:--as € P. This contradicts the fact that P is a prime ideal. Therefore P = P, 
for some i. Without loss of generality assume P = P;. We now have 


PP---P, C<a>CACP. 


Since s was minimal P,--- P, is not contained in < @ >. Therefore there is a @ € 
P,--- P, with B ¢ < a >.Lety = a !3. Then ¥ is not an algebraic integer. However 


yA=a'BACa'BP Ca'PP,---P, C Ox. 


Hence by definition y € A7!. 


Lemma 6.5.7 /f A is an integral ideal then A~'A = Ox. 


Proof Let B = A~'A. Then B C Ox so BB is an integral ideal. Then 
AA 'B=BB!C Ox = A'B'CA. 


It follows that for any a € B~! we must have A~'a C A™! and therefore A~!a” C 
A7! for all natural numbers n. But then A~! < a > is an Ox-submodule of A~! 
and is therefore finitely generated (see Theorem 6.5.1). However Ox[a] being a 
submodule of A~! < a > is also finitely generated. Since Ox is integrally closed 
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in K it follows that a € Ox. Therefore B~'! C Ox and hence B~! = Ox. It follows 
that B = Ox for otherwise by Lemma 6.5.5 Ox would be proper in B~!. O 


Lemma 6.5.8 Every integral ideal is a product of prime ideals. 


Proof From Lemma 6.5.4 we know that any integral ideal contains a product of 
prime ideals. If an integral ideal contains a single prime ideal it must coincide with 
that ideal since prime ideals are maximal. We now do induction on the length of a 
product of prime ideals contained in an integral ideal and assume that any integral 
ideal containing a product of fewer than n prime ideals is a product of prime ideals. 
Now suppose A is an integral ideal and A contains a product of n prime ideals; 


Py Poe Fy CA, 
As in the proof of Lemma 6.5.6 choose a maximal ideal P containing A so that we 
have 


Pi P2---P, CAC P. 


Again as in the proof of Lemma 6.5.6 P must coincide with one of the P; say P; so 
that we have 


PPP CA CP => PUR Pris+ Py C PA COg. 


The integral ideal P~' A now contains a product of fewer than n prime ideals so by 
our inductive hypothesis we have 


P'A=Q1--- Qs, 
where each Q; is a prime ideal. But then 
A=PP'A=PQ,---Q, 


is a product of prime ideals. 


Now that we have established that each integral ideal is a product of prime ideals 
we must show that this product is unique up to ordering. 


Lemma 6.5.9 Let P;--- Ps C Q1--- Q;, where the P; and Q; are all prime ideals. 
Then s = t and the set of Qj is just a rearrangement of the set of P;. 


Proof The proof mimics the proof of the uniqueness of factorization of the rational 
integers. Since Q;--- Q; C Q; we have 


Pi +++ P, C Q1---Q; C OQ). 
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Since Q is prime and hence maximal as in the proofs of the previous lemmas Q, 
must coincide with some P;. Without loss of generality we may assume then that 
Q, = P,. We then have 
-1 -1 
P,P, P2P3--+Ps C Py PiQo-++Q: => Po-++Ps C Qi-++ Q). 


Continuing in this manner we get the result. 


As an immediate consequence of this lemma we get the following corollary which 
is the required unique factorization. 


Corollary 6.5.1 Suppose A = P,--- P; = Qi--- Q; are two expressions for the 
integral ideal A as a product of prime ideals. Then s = t and the set of Q; are just 
a rearrangement of the set of P;. 


This series of lemmas completes the proof of the unique factorization theorem. If 
A is a nonzero proper integral ideal then from Lemma 6.5.6 it can be expressed as a 
product of prime ideals. Then from Corollary 6.5.1 this expression is unique. 

Finally we show that a divisor is a factor. Hence by the uniqueness theorem if A 
is a prime ideal it is also an irreducible ideal. Therefore for ideals the terms prime 
and irreducible become interchangeable. 


Lemma 6.5.10 Let A and B be integral ideals. Then A is a divisor of B if and only 
if A is a factor of B. 


Proof We have already seen that if A is a factor of B then A is a divisor, that is, if 
A|B then B Cc A. We must show then that if A is a divisor of B, that is, B C A, then 
A is a factor of B. Hence we must show that if B C A then there is an ideal C with 
B = AC. Now from unique factorization we have 


As Pp aaP? 
for some prime ideals P;,..., P,. Here we have combined identical prime ideals 
to an exponent as in the standard form of a rational integer. Since B C A it is an 
easy consequence of the unique factorization theorem that the factorization of B will 
contain all the prime ideals in the factorization of A and to a higher exponent. Hence 
B= Pi rid PI Oyen Oy 
with each f; > e; and Q;,..., Qs prime ideals. Then 


CSP Pr O20, 


is an integral ideal and B = AC. 
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6.5.2 An Application of Unique Factorization 


AS we saw in Chapter 2 many results are direct consequences of the Fundamental 
Theorem of Arithmetic. In a similar manner, as a consequence of the unique fac- 
torization theorem for ideals many of these results have lovely analogs for ideals 
in algebraic number rings. In this section we will look at one of these, the Chinese 
Remainder Theorem. In the final section, after we discuss the ideal class group, an 
analog of Fermat’s theorem will also be presented. 

Recall that for the rational integers the following is the Chinese Remainder The- 
orem. 


Theorem 6.5.5 (Chinese Remainder Theorem) Suppose that m,,m2,..., Mx are k 
positive integers that are relatively prime in pairs. If ay, ..., ax are any integers then 
the simultaneous congruences 


x =a; modm;,i=1,...,k 


have a common solution which is unique modulo m,m2--- mx. 


To extend this result we need to give the analogs of greatest common divisors 
(gcds) and least common multiples (Icm’s) for ideals. Since these concepts are defined 
in terms of divisibility the definitions are identical. 


Definition 6.5.6 Jf A and B are integral ideals in Ox then 


1. 
gcd(A, B) = D, 
where D is an integral ideal such that D|A,D|B and if D, is another integral 
ideal such that D,\|A and D,|B then D,|D. 
2. 


Icm(A, B)=L, 


where L is an integral ideal such that A|L, B|L and if A|L,, B\|L, for some 
integral ideal L, then L|L. 


From the unique factorization theorem it easily follows, in exactly the same man- 
ner as for the integers, that if 


A= P*!... P*® and B = Pj... P& 
with P;,..., P, distinct prime ideals and e;, f; > 0 and ae = Ox then 


gcd(A, B) = ps a print) 
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and 
Icm(A, B) = pee en pmax ers fr), 


Further since an ideal is a factor if and only if it is a divisor, that is D|A if and 
only if A C D it follows that gcd(A, B) is the smallest ideal containing both A and 
B while /cm(A, B) is the largest ideal contained in both A and B. Now the sum 
A+ B is the smallest ideal containing both A and B and the intersection A B is 
the largest ideal contained in both A and B. Hence 


gcd(A, B) = A+B, 
Icm(A, B) =ANB. 
Further, exactly as for the rational integers 
AB = gcd(A, B) -Ilcm(A, B) = (A+ B)(AN B). 


We summarize all these observations in the next theorem. 


Theorem 6.5.6 Let A, B be integral ideals in Ox and suppose 


A= Pf!... P® and B= Pj... Pf 


r 


with P,,..., P, distinct prime ideals and e;, f; = 0 and au = Ox then 
(1) gcd(A, B)=A+B= ee oe primers Se) 
(2) lom(A, B) = ANB = Pmarerf) ,., pmax fr), 
(3) AB= (A+ B)(ANB). 


Now to get the Chinese Remainder Theorem we need to extend the concept of 
relatively prime or coprime. Since P? = Ox we have: 


Definition 6.5.7 The integral ideals A, B are relatively prime or coprime if they 
have no common prime factor. Equivalently they are coprime if A+ B = Ox. 


We now get: 


Theorem 6.5.7 (Chinese Remainder Theorem for Ideals) Let {A,,..., An} be a set 
of integral ideals in Ox which are pairwise relatively prime, that is Aj + Aj = Ox 
ifi # j, and let {a,, ..., An} be an arbitrary set of algebraic integers in Ox. Then 
there exists an element ~ € Ox such that 


a =a; mod A; for! <i<n 


and further a is unique modulo A, A2--- An. 
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Proof The proof mimics the proof for the rational integers, that is we actually con- 
struct the element a (see Chapter 2). 

Since A;,..., A, are pairwise relatively prime it follows that A; is relatively 
prime to [] iti Aj. Hence for 1 <i <n there exist elements (;, 3; with G; € A; and 
Bie [ji A, such that 3; + 3; = 1. Now let 

O = 048} + 029) + +++ + On, 


Since 3; + Gi = 1 and (; € A; it follows that 3/ = 1 mod Aj. Further (7 € A; if 
i # j, so @; = 0 mod Aj. Therefore 


a =a; mod A; fori =1,...,n. 
Suppose a’ is another simultaneous solution to the given congruences. Then 
a—a’ €A,NAN--- Ap. 
Since they are pairwise relatively prime 


A, Ag---M Ay = Aj A2--- An. 


and hence a = a’ mod A, -:: Ay. 


6.5.3 The Ideal Class Group 


Out of the set of fractional ideals in Ox we will now form a group, called the ideal 
class group, which in a sense will measure how close Ox is to being a principal 
ideal domain and hence a unique factorization domain. In particular this group will 
be trivial if and only if Ox is a principal ideal domain. 

First of all, note that fractional ideals can be multiplied exactly as the ordinary 
integral ideals of Ox. That is if A, B are fractional ideals with 


A=<aqy,...,Qy >,B=< f\,..., 0, > 
then their product 
AB =< a1 (i, 0132, ..., Bj, --~5 Une > 


is the ideal generated by all products of the generating elements. 


Theorem 6.5.8 The fractional ideals of K form an abelian group under the above 
multiplication called the ideal group Tx of K. The unit element is < 1 > = Ox and 
the inverse element for a fractional ideal A is 
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Al ={x € K;:xAC Ox}. 


Proof Associativity and commutativity are clear. Further for any fractional ideal A 
we have AOx = Aso Ox is a unit element. Hence we must show inverses. 

If A is an integral ideal then from Lemma 6.5.6 we have A~!'A = Ox with A7! 
as defined in the theorem. Hence A™! is an inverse for integral ideals. Now let B be a 
fractional ideal. Then there exists ana € Ox with a 4 0 such that aB C Ox. Then 
(aB)~! = a~'B7! as defined above and hence BB~! = Ox. 


Corollary 6.5.2. Each fractional ideal A has, up to order, a unique product decom- 


position 
P 


with e, € Z, at most finitely many e, # 0 (recall P® = Ox) and {P} is the set of 
prime ideals in Ox. 


Proof This mimics the proof that any rational number is a product of rational primes. 
Each fractional ideal V can be written as a quotient V = 4 = AB of two integral 
ideals A, B. Since each of A, B has a unique expression as a product of prime ideals 
the result follows. 


The above corollary can also be phrased as: 


Corollary 6.5.3. The ideal group Tx is a free abelian group generated by the prime 
ideals PA <O> in Ox. 


If a € K* = K\{0} then aOx forms a fractional ideal. Any fractional ideal of 
this form is called a fractional principal ideal 


Theorem 6.5.9 The set of fractional principal ideals {aOx} with a € K* forms a 
normal subgroup of the ideal group Tx. We denote this subgroup by Px. 


Proof Now (aOx)(bOx) = abOx and (aOx)~! = a~'Ox so the set of fractional 
principal ideals is closed under product and inverse. Therefore Px forms a subgroup. 
Since the ideal group is abelian any subgroup is normal and hence Px is a normal 
subgroup. 


Since Pic is a normal subgroup we can form the factor group. 


Definition 6.5.8 The factor group 
Cle =TxK/Px 


is called the ideal class group or the class group of K. 
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Let O% be the group of units of Ox. Then there is an exact sequence 


1—> O% > K* > Tx > Cly > 1. 


The following is immediate. 
Theorem 6.5.10 Ox is a principal ideal domain if and only if Clx = {1}. 


In general the problem of determining the class group Clx is quite complicated. 


6.5.4 Norms of Ideals 


We define a norm for an ideal which is related to the norm of an element. Further we 
show that this norm is multiplicative. 


Definition 6.5.9 Jf A is an ideal in Ox then we define the norm of A by 
N(A) = [Ox : Al. 
First of all notice that the norm of an ideal is always finite since 
d(A) = [Ox : A?'dx, 
where d(A) is the discriminant of the ideal and dx is the discriminant of the field. 


The following result shows how the norm of an ideal is related to the norm of an 
element. 


Theorem 6.5.11 If < a > is a principal ideal in Ox then 
N(<a>)=|Nx(a)l. 


Proof Suppose w,..., Wy, is a Z-basis for Ox. Then aw), ..., dw, is a Z-basis for 
aOkx. If aw; = pa Ajj; and A = (aij) then 


| det(A)| = [Ox : aOx] 


on one side while det(A) = Nx (a) by definition. 


Further this norm is multiplicative on the set of ideals. 
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Theorem 6.5.12 Let A be a nonzero integral ideal in Ox. If 
A= P,P,--- P, 
is the prime ideal decomposition of A then 
N(A) = N(PI)N(P2) >> N(P,). 


In particular 


N(AB) = N(A)N(B) 
for nonzero integral ideals A, B. 


Proof Suppose A is a nonzero integral ideal and A 4 Ox. Then A has a canonical 
prime ideal decomposition 


AS PsP eS 1,2: 1 


with pairwise different P;. We must show that 


N(A) = [[N@*. 


i=1 


By the Chinese Remainder Theorem we have 
Ox/A=@Q Ox/Pi 
i=1 


which gives 


N(A) = [[N(P%). 


i=l 


It remains to show that for each prime ideal P and each natural number n we 
have [P” : P”+'] = N(P). For this we choose a t € P”/P"*! and consider the 
homomorphism of abelian groups given by x > tx + P”*t! from Ox into the factor 
group P”/P"+!, 

The kernel of this map is an ideal in Ox. The kernel does not contain all of Ox 
sincet ¢ P”*! but it does contain P sincet P C P”*!. Therefore since P is maximal 
this kernel must be P. The image of this homomorphism is the factor group T/P"*!, 
where T = tOx + P"*! is an ideal in Ox contained in P” but not contained in P” 
Therefore we must have precisely J = P”. The isomorphism theorem for abelian 
groups then gives 


HL 


Ox/P= P?jP™. 
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Hence in particular 
Oper SNPS TP se] 


completing the proof. 


Suppose P is a nonzero prime ideal in Ox. Then it is a maximal ideal and hence 
the factor ring Ox /P is a field and hence a finite field since [Ox : P] is finite. If 
its characteristic is p then PM Z = pZ, where p is a rational prime. Now \V(P) is 
the number of elements in Ox /P and therefore V(P) = pf for some f € N. This 
exponent is called the residue class degree of the prime ideal P. It is the degree of 
the field Ox /P over its prime field Z,,. The multiplicative group (Ox /P)* is cyclic 
being the finite multiplicative group of a field (see Chapter 2 and the exercises). From 
this we obtain the analog of Fermat’s theorem for ideals in Ox. 


Theorem 6.5.13 (Fermat) If P 4 < 0 > is aprime ideal in Ox then 
aN) = a mod P 


for alla € Ox. 


We saw in Section 6.4.3 that rational primes in quadratic integer rings may be 
decomposed in Ox. Further we can classify all possible situations. We generalize 
this. 


Theorem 6.5.14 (Decomposition of a rational prime). Let p be a rational prime. 
The exponent e(p) = vp(pOx) of a prime ideal P with P|pOx in the prime ideal 
decomposition is called the ramification index of p in K over Q. Then 


> ep) fp) = 1K : O, 
P|pOx 
where f (p) is the residue class degree of p. 


Proof Let n = [K : Ql] be the degree of K over Q and let p be a rational prime. 
Then 


N(pOx) = |N(p)| = P". 


On the other hand by the Chinese Remainder Theorem Ox /pOx is isomorphic to 
the direct sum of the factor rings Ox /(P°”), where P| pOx. Hence 


p" =|Ox/pOx|= I] N(P)e? = I] PlMer). 


P|pOk P\pOk 


Finally we show that there are only finitely many elements a in Ox of a given 
norm. 
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Theorem 6.5.15 Up to units there are only finitely many elements a € Ox with a 
given norm Nx(a) = a. 


Proof Leta be arational integer with a > 1. We first claim that in each of the finitely 
many residue classes of Ox /aOx there are, up to units, at most one element a with 
INx(@)| =a. 
To see this suppose 3 = a + ay withy € Ox is another element with | Vx (3)| =a. 
Then 
a N(9) 


=1+ 
B p 


7 € Ox 
since we € Ox. Analogously 


€ Ox. 


N(a@ 
6 _ 14 N@), 
a a 
This implies that a, @ are associates, that is a = €(@ with € a unit. 

It follows that up to units there are at most [Ox : a@0x] elements in Ox with the 
norm «da. 


6.5.5 Class Number 


In this final section we show that the ideal class group must be finite giving another 
finite integer invariant for each number field. 

The Minkowski Theory (see Section 6.4.5) leads to the following which we state 
without proof (see [Ne]). 


Theorem 6.5.16 Each ideal A 4 < 0 > in Ox contains an element a € A with 


2 
INx(a)| < (—)' Vldx IN'(A), 


whereas before s denotes the number of pairs of complex, non-real embeddings of 
K intoC 


Using this result we obtain: 
Theorem 6.5.17 For each algebraic number field K the ideal class group 
Clk = Ix /Px 


is finite. Its order hx = [ZK : Px] is called the class number of K. 
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Proof Let P 4 (0) be a prime ideal in Ox and suppose PN Z= pZ with p a 
rational prime. Then Ox/P is a finite extension of its prime field F, = Z/Z, of 
degree f > 1. Hence V(P) = pf. 

For a fixed rational prime p there are only finitely many prime ideals P with 
PQZ= pZ since then P| pZ. Therefore there are only finitely many prime ideals 
P with bounded absolute norm. Now each nonzero integral ideal A has a prime ideal 
decomposition 

A= P;'--- P® withe, > 1 


and then we have 


N(A) = (N(P1))% ++» NCP). 


Putting this altogether we have that there are only finitely many ideals A 4 (0) 
in Ox with bounded absolute norm (A) < M. 

Hence it is enough to show that each class [A] € Clx contains an integral ideal 
A, with 


2 
N(A\) <M= (=) dr. 


where s is as in Theorem 6.5.16. 

To show this, choose an arbitrary representative A ~ (0) in this class and a nonzero 
7 € Ox with B= yA~! C Ox. By Theorem 6.5.5.1 there exists an a € B with 
a # 0 such that 


INx(a)|(N(B))~! = N((aOx)B7') = N(aB™') < M. 


The ideal A, = aB~'! = ay"!A € [A] has the desired property. 


We remarked before that an algebraic number ring Ox is a principal ideal domain 
if and only if its ideal class group is trivial. Hence in the present language we can 
say that Ox is a principal ideal domain if and only if the class number of K is 1. 

For quadratic imaginary number fields Q(./—d) Heegner, Stark, and Baker proved 
the following. 


Theorem 6.5.18 Let K = Q(./—d), where d is a squarefree positive integer. Then 
K has class number 1, that is hx = 1, if and only if 


d = 1,2,3,7, 11, 19, 43, 67, 163. 


For more on this see [Ri 3]. We end with the following conjecture. 


Conjecture 6.5.19 There are infinitely many algebraic number fields with class 
number one. 
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6.6 Exercises 


6.1 Show that in any ring R with identity 1 (commutative or not) that if wv = | and 
wu = | then v = w. Hence if an element has both a left and right inverse it is a unit. 


6.2 Let T be ann x n matrix over a field F'. Suppose TU = J for some matrix U. 
Show that UT = J also. 

(Hint: Consider 7 as a linear transformation. If TU = J it must have rank n. 
Hence there exists a matrix V such that VT = J. Apply Problem 6.1) 


6.3 Show that the set of units in a commutative ring R with identity forms an 
abelian group under multiplication. 


6.4 Show that if a € Z, then a is a unit if and only if (a, n) = 1. 


6.5 Show that in any UFD there are infinitely many primes. (Hint: Use Euclid’s 
Proof) 


6.6 Prove Lemma 6.2.1. Let F be a field and let P(x) 4 0, Q(x) 4 0 be nonzero 
polynomials in F [x]. Then: 


1. deg P(x) Q(x) = deg P(x) + deg Q(x). 
2. deg (P(x) + Q(x)) < max(deg P(x), deg Q(x)) if P(x) + Q(x) £0. 


6.7 Let F be a field and F[x] the set of polynomials over F’. Verify the ring 
properties for F[x]. 


6.8 Fill in the details for a proof of the division algorithm in F [x]. (Hint: Consider 
the degrees of the polynomials.) 


6.9 Let S be a subring of the field F (such as Z in R). Let S[x] consist of the 
polynomials in F'[x] with coefficients from S. Show that S[x] is a subring of F[x]. 
Recall that to show a subset is a subring we must only show that it is nonempty and 
closed under addition, subtraction, and multiplication. 


6.10 Use the division algorithm to find the quotient and remainder for the following 
pairs of polynomials in the indicated polynomial rings. 

(a) f(x) = x3 + 5x? + 6x +1, g(x) = x —1in R[x]. 

(b) f(x) = x3 + 5x? + 6x +1, g(x) =x —1inZs[x]. 

(c) f(x) = x34 5x7 +6x +1, g(x) =x —1inZ,3[x]. 
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6.11 Use the Euclidean algorithm to find the gcd of the following pairs of poly- 
nomials in Q[x]. 

(a) f(x) = 2x3 — 4x? + x — 2, g(x) =x3-— 72-7 -2, 

(b) f(x) =xtt x3 tnx%4u4 1, g(x) = x?-1, 


6.12 Show that if f(x) € R[x] and a € Cis a root then a, its complex conjugate, 
is also a root. 


6.13 Use the Fundamental Theorem of Algebra coupled with Problem 6.12 to 
show that if p(x) € R[x] is irreducible then p(x) is of degree 1 or of degree 2. 


6.14 Prove Lemma 6.2.8: Let R be a Euclidean domain and let r}, r2 € R. Then 
any two gcds of 7}, r2 € R are associates. Further an associate of a gcd of rj, rz is 
also a gcd. 


6.15 Prove Lemma 6.2.9: Suppose that R is a Euclidean domain and r},7r2 € R 
with r2 € 0. Then a gcd d for 1, rz exists and is expressible as a linear combination 
with minimal norm. That is there exists x, y € R with 


d=r\x +ny 


and N(d) < N(d;) for any other linear combination of 1), r2. 
Further if r; 4 0,72 40 then a gcd can be found by the Euclidean algorithm 
exactly as in Z and F[x]. (Hint: Mimic the proof in the ordinary integers Z.) 


6.16 Suppose D is a Euclidean domain and assume r € D has two prime factor- 
izations 
2 se oa 


withr},...,7%, 81, ---, 5; all primes in D. Show that each r; is an associate of some 
s; and k = t. (Hint: Use Euclid’s Lemma repeatedly.) 


6.17 Prove Lemma 6.2.11: If a, G € Z[i] then: 


. N(q) is an integer for all a € Z[i]. 

. N(q) = 0 for all a € Z[i]. 

. N(qa) = Oif and only if a = 0. 

. N(a) > 1 foralla 40. 

. N(aB) = N(a)N() that is the norm is multiplicative. 


nAbWN Re 


6.18 (a) Find the gcd and Icm of the Gaussian integers 5 + 3i and 6 — 4i. 
(b) Determine if 1 + 47 and 13: are primes or not in Z[/]. 
(c) Determine the prime decomposition in Z[i] of 3 + 57. 
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6.19 Solve the congruence in Z[i]. 


(2 + 3i)x = 1 mod 1 + 3i 


6.20 Suppose that p(x) = ayx* + +--+ a9 € Z[x] and p(r) = 0 withr =“ € 
Q. Show that m|ao, n|a,. (This is called the rational root theorem). 


6.21 Use the rational root theorem coupled with polynomial factorization to show 
that 
p(x)=x?—x4+5 


is irreducible over Q. 


6.22 Use the multiplicativity of the norm to show that in Z[./—5] the numbers 
3,7,1+ 2/5 ,l-2i J/5 are all primes and not associates of each other. Recall that 
N(a + biv5) = a? + 5b”. 

Since 21 = 3-7 = (1 + 2iV/5)(1 — 21/5) this shows that prime factorization is 
not unique in Z[./—5]. 


6.23 Prove that any Euclidean domain is a principal ideal domain. (Hint: Let 
Ic D,with I 4 {0}, be an ideal with D a Euclidean domain. Letr € J with minimal 
norm. Mimic the proof in Z to show that J = (r). 


6.24 Show that the following properties hold in a PID. 
(i) a|b if and only if< b>C<a>. 

(li) <b > =< c > if and only if b and c are associates. 
(iii) < a > = R if and only if a is a unit. 


6.25 Prove that if R is a UFD then the polynomial ring R[x] is also a UFD. 


6.26 Let F be a field and / the set of polynomials in F'[x, y] with constant term 0. 
Show that this forms an ideal which is not principal. 


6.27 Let R be an integral domain and J C R an ideal. Show that r; ~ ro if r) — 
r2 € I defines an equivalence relation on R. (Since the equivalence classes are the 


cosets of J this shows that the cosets partition R.) 


6.28 Suppose F is a field and p(x) € F[x] is irreducible. Then show that if 
(x) =x + < p(x) > in the factor ring 


F' = F{[x]/ < p(x) > 


then p( < x >) =< p(x) >. (Consider the operations in F’.) 
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6.29 Prove Lemma 6.3.1: If F Cc F’ C F” are fields with F” a finite extension of 
F, then |F’ : F| and |F” : F’| are also finite, and 


|F": Fl) =|F": F'||F’: FI. 


6.30 Show thatif F C F’ are fields anda € F’ then the intersection of all subfields 
of F’ containing both a and F is again a subfield. 


6.31 Let K be an algebraic number field of degree n. On the set of n embeddings 
K — C fixing Q define the relation 0 ~ 7 if o(a@) = T(a) fora € K. Show that this 
is an equivalence relation. 


6.32 Let a € R be algebraic over Q and ( be transcendental. Show that a + 
B, 8, 5 are all transcendental. 


6.33 Let F be a field and xo, x),..., X, are n + | distinct elements of F. Prove 
that the Vandermonde determinant has the value 


Xo - Xo 
1 Xp i03.X7 
V(x, ---,X%,) = : : = | [@; -»). 
| Ue ere ag 
(Hint: Use the following steps) 
(i) Show that it is true for n = 2. 
(ii) Let V,(x) = V(xo,..., Xn—1, X) with x as a variable. Show that V,,(x) is a 
polynomial of degree n with roots xo, ..., Xn—1.- 


(iii) Use part (ii) to show that 
Vi (x) = V(x, -- +, Xn-1)(% — X0) +++ — Xn). 

(iv) Substitute x, to complete the induction and the proof. 

6.34 Let K = Q() be an algebraic number field of degree n. For a € K define 
the mapping 7, : K — K by 

Ta(x) = ax. 

Show that this is a linear transformation of the n-dimensional Q-vector space K. 

6.35 A primitive integral polynomial is a polynomial p(x) € Z[x] such that the 
gcd of all its coefficients is 1. Prove the following: 


(a) If f(x) and g(x) are primitive then so is f(x)g(x). 
(b) If f(x) is monic then it is primitive. 


370 6 Primes and Algebraic Number Theory 


(c) If f(x) € Q[x] then there exists a rational number c such that f(x) = cf\ (x) 
with f(x) primitive. 


6.36 Let K = Q(./—d) with d squarefree. Let w = Jd if d = 2 mod 4 ord =3 


mod 4 and let w = vd if d = 1 mod 4. Show that every integer in Ox is uniquely 


of the form m + nw,m,n € Zand {1, w} is an integral basis. 


6.37 Let d = 3, K = Q(./—d) and w = ="'¥3. Show that tw, +W are units in 
Ox. (Note that w? = 1.) 


6.38 Complete the proof of Theorem 6.5.1, that is that A does indeed have an 
integral basis. (Hint: Mimic the proof of Theorem 6.4.4.) 


6.39 Show that the product of two ideal is independent of generating system, that 


is if A=<q,...,Q@, >,B=< (),..., Gy, > are ideals in Ox and also A =< 
/ / = / / 
Qi... Q), >, B=< G,..., A > then 
/ / } / 1 -Ql / / 
< 013), 12, ...., AF Bj, -. +, Am Px PSS OP ys Op Gag xets OB yy rons Og Ge ae 


6.40 Prove that the sum of fractional ideals is again a fractional ideal. 


6.41 Express the symmetric polynomial f (x1, x2, x3) = ei + i + xe as a poly- 
nomial in the elementary symmetric polynomials 51, 52, 53. 


6.42 Find the minimal polynomial of /2 + /3 over Q. (How do you know its 


algebraic?) (Hint: Q v2, V3) has degree 4 over Q and hence /2 + /3 has degree 
2 or degree 4 over Q. Show that it cannot have degree 2). 


6.43 Let p be a prime and @ a rational number not a pth power. Let K = Q(@r). 
Show that if K, is a field with Q C K, C K theneither K, = Qor Kk; = K. 


6.44 Let a1, ..., a, be algebraic integers in K. Show that if a), ..., a, is a basis 
for K over Q and A(q,..., Q,) is squarefree then a), ..., @ is an integral basis. 


6.45 Let a, 3 be algebraic integers in K and < a >, < (3 > the principal ideals 
they generate. Show that if <a >| < @> thenal/. 


6.46 Classify the algebraic number fields K with discriminant 


—100 < dx < 100. 


Chapter 7 
The Fields Q, of p-Adic Numbers: Hensel’s 
Lemma 


7.1 The p-Adic Fields and p-Adic Expansions 


In the previous chapter, we described algebraic extensions of the rational numbers. 
We then saw that the arithmetic of the integers within these algebraic number fields 
was similar to that of the ordinary integers and further that many algebraic number 
fields allowed unique factorization while all these fields allowed unique factorization 
in terms of ideals. 

In this chapter, we look at a separate type of extension of the rational field moti- 
vated by both analysis and algebra. For each prime p, we will get a new field called 
the field of p-adic numbers denoted by Q,. These fields will be constructed in 
a manner analogous to the way the real number system R is constructed from Q. 
The p-adic numbers can be used to consider and study congruences modulo p and 
modulo p” and have many applications in classical number theory. In particular they 
were used in the proof of Fermat’s last theorem by A. Wiles (see [W]). 

The p-adic numbers were first developed by Kurt Hensel in 1897 and for each 
prime p they can be considered as a completion of the rational numbers. To under- 
stand this, let us recall some facts about the real number system. We will go deeply 
into these in the next section. The real numbers have the property that every Cauchy 
sequence (see Section 7.2) of real numbers has a limit. This is not true for the rational 
numbers. Because of this we say that the real numbers are complete. Further each 
real number is actually the limit of a sequence of rationals. We say that Q is dense 
in R and that R is the completion of Q. 

Convergence of sequences in R and Q depends upon measuring distance. For the 
standard approach we measure distance in terms of absolute value, that is if7, s € R 
then d(r, s) = |r — s|. We say that absolute value is a norm on the field R and the 
reals are a normed field. What Hensel and others noticed is that the completion of 
Q can be carried out for any normed field. A norm can be placed on Q depending 
on a given prime p and the resulting normed field can be completed as was R. The 
resulting field is the field of p-adic numbers. The actual details will be given in 
Section 7.2. 
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To do p-adic arithmetic we must recall the p-ary expansion of real numbers. Any 
real number r can be expressed as a decimal expansion 


— S a10! 


i=—oo 


where a; € {0, 1, 2,3, 4,5, 6, 7, 8, 9} and there are finitely many decimal places to 
the left of the decimal point and possibly infinitely many to the right. 

Although in common practice we use a decimal expansion, that is base 10, in 
reality any base m € N can be used. Historically, we use base 10 because we have 
10 fingers or digits. We have the theorem: 


Theorem 7.1.1 Letr €¢ Randm €N with m > 2. Then, r can be expressed as 


n 
r= > ajm' 


i=—0o 
where a; € {0,1,...,m— 1}. 


The expansion in Theorem 7.1.1 is called the m-ary expansion. We give an 
example for an integer in base 5. 

EXAMPLE 7.1.1 Determine the 5-ary expansion of 371. 

The method uses the division algorithm and is related to the Euclidean algorithm. 
We first consider the highest power of 5 that is less than 371. This is 5> = 125. We 
then use the division algorithm to obtain 


371 = (2)(125) +121 =2-5°+ 121. 
We now repeat the process with 121 to obtain 
371] 2-59 4-45? 21 2.59 44.57 145541, 


This gives the 5-ary expansion. We write this as (2441)5. Writing 371 without the 
base indicates the standard base 10. 

Arithmetic can be done exactly as in standard decimal expansions but carries must 
be done modulo m. 

EXAMPLE 7.1.2 Add the numbers (2441)5 and (3244)s5 


Here, we write 
2441 


3244 


11240 
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In base 10 this is 1-54 +1-5°+2-57+4-5+0 = 820. In base 10 (2441)5 = 
371 and (3244); = 449. We have 371 + 449 = 820 and, as they must, the additions 
agree. 

Base 2 expansions are called binary expansions. Because these only use two 
digits, 0, 1, binary expansions become extremely important in representing numbers 
on a computer. The digit 0 can be expressed as an open electrical gate while the digit 
1 by a closed gate. Thus, any integer can be expressed as a sequence of open and 
closed circuits. 


7.2 The Construction of the Real Numbers 


The construction of the p-adic fields is entirely analogous to the construction of the 
real numbers from the rational numbers. What differs is the way distance is measured. 
We first describe the construction of R. 


7.2.1 The Completeness of Real Numbers 


There are several different constructions of the real number system R starting with 
the rational numbers Q. The two best known are the Dedekind cut construction and 
the Cauchy completion procedure. For our purposes in studying the p-adic fields 
the second is the most important. We recall first some basic facts about sequences and 
completeness in R and then in the next subsection show how R can be constructed 
starting from Q. 

The analytic properties of the real numbers depend upon distance which in turn 
depends upon absolute value. Recall that if x € R then its absolute value is defined 
by 


x ifx >0 
|x| = : 
—x ifx <0. 


Lemma 7.2.1 We have the following properties for absolute value: 
I. |x| => Oand |x| = 0iffx =0 

2. |xy| = Ix|lyl 

3. [x + y| < |x| + |y| (triangle inequality) 


Absolute value forms a norm on R and we say that R is a normed field. 
Absolute value allows us to define distance on R. In particular, if x, y € R then 
d(x, y) = |x — y|. This then satisfies the common properties of a metric 
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1. d(x, y) => Oand d(x, y) = Oif and only if x = y 
2. d(x, y) =d(y, x) 
3. d(x, y) < d(x, z) + d(, y)(triangle inequality). 


The completion of Q depends upon the convergence of sequences. 


Definition 7.2.1 A sequence (x,) in R converges or has a limit x € R denoted 
Xn — x if for all € > 0 there exists an N = N(e) such that |x, — x| < € for all 
n > N. If a sequence has no limit we say it diverges. 


The following is clear but important. 


Lemma 7.2.2. The limit of a sequence is unique; that is if lim x, = x andlim x, = y 
thenx = y. 


Crucial to our construction are Cauchy sequences. 


Definition 7.2.2 A sequence (x,) is a Cauchy sequence if for all « > 0 there exists 
an N = N(e) such that |xXp — Xm| < €foralln,m > N. This means the terms of the 
sequence cluster close to each other after a certain point. 


Roughly in a convergent sequence all the terms of the sequence after a certain 
point are close to the limit. In a Cauchy sequence, all the terms after a certain point 
are close to each other. Clearly a convergent sequence must be a Cauchy sequence. 
However within the rationals there are Cauchy sequences that do not converge within 
Q as the next example shows. 

EXAMPLE 7.2.1.1 Consider x = /2. This is an irrational number so it has a 
non-repeating infinite decimal expansion 


x=V2=1.414.... 


Let x; = 1, x2 = 1.4 and in general x, the (nm — 1)-st decimal approximation of af: 

Now (x,,) is a sequence of rational numbers. Within R we have lim x, = x so within 

R the sequence (x,,) is a convergent sequence and hence a Cauchy sequence. Since 

distance is measured the same in Q as in R this is also a Cauchy sequence in Q. 

However there is no limit within Q since limits of sequences are unique and x ¢ Q. 
Within the real number system though the following theorem is true. 


Theorem 7.2.1 A sequence in R converges if and only if it is a Cauchy sequence. 


As we will see in the next section, this theorem is a direct consequence of the 
construction of R starting with Q. In most analysis courses a proof of this theorem 
depends on the least upper bound property, which we introduce below. 

Because of the above theorem, we say that R is complete. Geometrically the 
completeness of R is essentially equivalent to the fact that R is in 1-1 correspondence 
with the points on a line. Further, convergence and completeness allow us to define 
and study all the analytic properties of functions—continuity, differentiability, and 
integrability. 
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The real numbers R are an ordered field. That is, on R there is an ordering such 
that if a, b € R then either a < b ora = b ora > b. The common properties of R 
can be defined for any ordered field. 


Lemma 7.2.3 In any ordered field, F, squares must be positive, that is, x? > 0 for 
all x # 0. In particular in R the equation x* + 1 = 0 has no solution. 


Proof Letx € Rwithx 4 0.Theeitherx > Oor—x > 0.Ifx > Othenx? = xx > 0 
since the positive elements are closed under multiplication. If x < 0 then —x > 0 
and x? = xx = (—x)(—x) by the laws of signs. But then (—x)(—x) > O and hence 
aS 0. 


Definition 7.2.3 Let F be any ordered field and S C F. 

(1) Then, S is bounded if there exista,b € F witha <s <b foralls € S. The 
element b is called an upper bound for S and the element a is called a lower bound 
for S. An element b € F is a least upper bound or LUB for S if b is an upper bound 
for S and if by is another upper bound for S then b < by. 

(2) Suppose a, b € F. Then the closed interval with endpoints a, b is the set 


[a,b)={x Ee F;a<x <)}. 


Note that by reversing all the inequalities in the above definition, we could also 
define the greatest lower bound for S or GLB. The GLB is not necessary for our 
discussions. 


Definition 7.2.4 (1) An ordered field F satisfies the least upper bound property (the 
LUB property) if every nonempty subset S C F which has an upper bound in F also 
has a least upper bound in F. 

(2) An ordered field F satisfies the nested intervals property if whenever (I, = 
[an, bn] C F where a, < b, for all n) is a sequence of nested closed intervals 
(In41 C In) whose lengths go to zero then there exists a unique point in F com- 
mon to all the intervals, that is, (\ In € F. 

n 

The key result on the completeness of R is that these properties are equivalent 

and further equivalent to the fact that Cauchy sequences converge. 


Theorem 7.2.2. Let F be an ordered field. Then the following are equivalent 
(1) F satisfies the LUB property. 
(2) F satisfies the nested intervals property. 
(3) Every Cauchy sequence in F actually converges. 


Definition 7.2.5 An ordered field is complete if it satisfies any (and hence all) of the 
properties in the last theorem. 


We then have: 


Theorem 7.2.3. The real number field R is a complete ordered field. 


376 7 The Fields Q, of p-Adic Numbers: Hensel’s Lemma 


7.2.2 The Construction of 


As we mentioned there are several constructions that arrive at the reals R beginning 
with the rationals Q. In this section, and most relevant to the construction of the p- 
adic numbers, we describe a construction known as Cauchy completion. We present 
the proofs for R starting with Q but these proofs are entirely general for any ordered 
field and will refer back to them when we construct the p-adic fields. 

Cauchy completion is a general procedure to embed an incomplete metric space 
M as a dense subset of a complete metric space M. The complete metric space M is 
called the Cauchy completion of M. We explain these terms which are in essence 
generalizations of properties of the reals. 


Definition 7.2.6 A metric space is a set M with a distance function on it, that is, 
a functiond: M x M = R satisfying 

(1) d(x, y) = Oand d(x, y) = 0iffx = y; 

(2) d(x, y) =dQy, x); 

(3) d(x, y) < d(x, z) + d(z, y) (triangle inequality). 


The rational numbers Q and the real numbers R are metric spaces where d(x, y) = 
=, 

In any metric space we can define sequences, convergence and Cauchy sequences 
exactly as in the real numbers. In general we say that a metric space M is complete 
if every Cauchy sequence in M converges to an element of M. 

A subset S in a metric space M is dense in M if given any x € M and real number 
€ > Othereisas € S with d(x,s) < €. This means that any point in M is arbitrarily 
close to a point in S. This is equivalent to the fact that given x € S there exists a 
sequence (x,) C M whose limit is x. For example the rationals are dense in the reals 
(see the exercises). 

Notice that the equivalence of Cauchy sequence completeness to the least upper 
bound property, that holds in an ordered field, does not necessarily hold in a general 
metric space. In a metric space we may not have any order. 

Starting with the rationals Q, we want to construct an ordered field F which is the 
completion of the rationals with respect to absolute value distance. That is, we want 
to construct a field IR such that Q C R and R is complete as a metric space. Further 
Q is a dense subset of R. 

The Cauchy completion of Q proceeds in the following manner: 

Step (1): Consider the set Q of all Cauchy sequences of rationals. That is, an 
element of Q is a Cauchy sequence (q), g2,...) of rational numbers. Define on Q 
the relation 

(1, q2,---) ~ (S1, S2,...) iff lim(g; — s;) = 0. 


That is, after some index i, the two sequences get arbitrarily close. 


Lemma 7.2.4 This defines an equivalence relation on Q. 
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We leave the proof of this lemma to the exercises. 

Step (2): Let R be the set of all equivalence classes of Cauchy sequences of 
rationals under the equivalence relation above. We now want to show five things: 

(1) R is an ordered field. 

Q)QcCR. 

(3) R is a metric space. 

(4) Q is dense in R. 

(5) R is complete. 

Step (3): We have the following theorem. 


Theorem 7.2.4 R is an ordered field. 


Proof To show that R is a field we have to show that we can define addition, additive 
inverses, multiplication and multiplicative inverses to satisfy the field axioms. To 
prove this we need to know that Cauchy sequences are bounded so we prove this 
first. 


Lemma 7.2.5 If x = x1, X2,... isa Cauchy sequence then (x;,) is bounded, that is, 
there isa B > O with |r,| < B forall n. 


Proof Let « = 1. Then since (x,,) is a Cauchy sequence it follows that there exists 
an N such that |x, — x| < 1 for all n,m > N. In particular if n > N we have 
|X, — xXn| < 1. Thenifn > N we have 


Xn = Xn —Xy + Xy| = Xn — Xl + [xy < Ixy| + 1. 


Now let B = max{|x,|,..., |x|, |xw| + 1}. Then from the above |x,| < B for 
all n. 


Now let r,s € R, i.e., r and s are equivalence classes of Cauchy sequences. So 
letr = [(@1,.--,9n,---)] and s = [(t,...,t,...)] be the equivalence classes of 
Cauchy sequences of rationals (g,,) and (f,), respectively. We define 


rts=[(M+£h,..-,drntt,..-)I, (7.1) 
r-s=([(git,---,dntn,---)I- (7.2) 


For this to make sense, we have to show that (q, + t,) and (qnt,) are again Cauchy 
sequences and that addition and multiplication of these equivalences classes are 
independent of the equivalence class representative chosen. That is, + and x defined 
this way are well-defined. Here we will show that multiplication of equivalence 
classes is well-defined and leave all the other verifications to the exercises. For this 
purpose, suppose that (g,) ~ (q,) and (t,) ~ (t,). Then we have lim(g, — q,) = 
lim(t, — t,) = 0. We must show that 


lim(gntn — q,t,) = 0. 
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But all sequences here are Cauchy, hence they are all bounded. In particular, there 
exists M, and there exists M> such that |¢,| < M@, and \a,,| < Mp) foralln EN. 
Now let € > 0. Since g, — qd, — Oandt, — i, — 0, there exists N; and N> such 
that lan _ dn| < Mi, for alln > N, and |t, _ t, | < x7, foralln > N2. Taking N = 
max{N,, Nz}. We have, using properties of absolute values, that for alli > N 


Qntn ~ nt = lant = ytn) ole (Gytn ~~ Antn) (7.3) 
S 14n — qn tn] + tn — t, dn (7.4) 
— Mi + Mp =< +5 (7.5) 
< = = ¢. . 
i" WM 


This shows that (qntn — q,,t,,) > 0. (The other verifications are done in a similar 
manner.) 

Clearly [(0,0,...)] and [(1, 1,...)] are additive and multiplicative identities, 
respectively. For this to make sense, (0,0,...) and (1, 1,...) must be Cauchy 
sequences of rationals. The properties of commutativity, associativity, and distrib- 
utivity follow from the fact that these are true in Q. The additive inverse of an 
equivalence class r € R, is if r = [(g,)], defined as —r = [(—q,)]. It is clear that 
if (gn) is a Cauchy sequence of rationals, then so is (—q,). It follows that [(—q,)] 
makes sense an element of IR. Thus far we have shown that R is a commutative ring 
with unity. 

It remains to show that every nonzero element of R has a multiplicative inverse. If 
r = [(gn)] is an equivalence class of a Cauchy sequence andr # 0 then lim g, 4 0. 
We leave it as an exercise to show then that there exits a N such that for alln > N we 
have that g, 4 0. Therefore it makes sense to define 1 = 0, 0.2550, 2 )]. 


dn ’ Qn41 pee 
We need to show that the sequence (0,0,...,0, = 
i 


arte .) is a Cauchy sequence. This is also left as an exercise. Also note that 


r- i =[(0,...,0,1,1,...)] =[d,1,...,1,1,...)]. Thus we have shown that the 
set R with these operations is indeed a field. 

Givenr = [(q1,.--,9n,---)] € R, then we define r > 0 or, is positive to mean 
r #0, that is, it is not equal to the equivalence class [(0,0,...,0,...)]) and r = 
[(qn)] for some Cauchy sequence of rationals such that lim g, 4 0 and there exists 
an N such that g, > 0 for alln > N. Again since r > 0 was defined on equivalence 
classes (r is an equivalence class), we must show this is well-defined. We leave 
this verification to the exercises. If r,s € R, define r > s to mean r — s > O. This 
defines an order on R and hence R is an ordered field. Again we leave the details to 
the exercises. 

Step (4): We now show that Q Cc R. 


Theorem 7.2.5 Q CR. More precisely, we can embed Q as a subring of R. 
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Proof To each q € Q associate the sequence (q,q,q9,q,...). This is clearly a 
Cauchy sequence of rationals. Hence, [(¢,q,q,q,-.-..)] € R. Note that (q,q, 
9,9,---)~G.--,9,9,49,---). So that [(¢,9¢,9,9,...JI=[C.-.9,9,9,.---)I]. 
Consider the map g +» (g,q,4q,9q,....). This mapping embeds Q into R and hence 
we can consider Q as a subset of R. 


Note that this theorem implies that when we talk about a rational number q in R 
what is meant is g = [(....9,9,q,---)] where g is a rational number. 
Step (5): We must show that R is a metric space. 


Theorem 7.2.6 R is a metric space. 


Proof To make R a metric space we define an absolute value on R and then use this 
to define distance by d(r, 5s) = |r —s|. If r= [(q1, q2,...)] € R then we define 
Ir| = [(lqil, lg2|,...)]. We must first show that (|q1|, |g2|,...) is again a Cauchy 
sequence of rationals. To see this, consider any € > 0 then there exists an N = N(e) 
such that foralln, m > N wehave |g, — qm| < €. This must be true since (g1, q2,...) 
is a Cauchy sequence. But now 


ldnl = (Gn — dn) + Gm\ < In — Gn\ 1 ldm| => 


ldnl - ld < Gn — Am : 


Similarly, we can get ldm| _ ldn| = ld a Qn = Idn = Am| . But this implies dnl = 
ldm| = — dn — Gm|. Combining this with the display above, gives 


~ ldn ~ Am < ldnl ~ ld < Idn ~ Am => 4n| ~ ldnl| = Idn = Qn : 


But this implies from above that ||g,| — |¢m|| < €foralln,m > N.Thus (|qi|, |go|,... 
is a Cauchy sequence of rationals and hence |r| = [(|q:|, |g2|,.-..)] is in R. We also 
need to show that this definition of absolute value is well-defined because it was 
defined on equivalence classes. This is not hard using the inequality proved above 
and so is left for the exercises. It is also not hard to show that |r| satisfies the usual 
absolute value properties (see the exercises). Therefore d(x, y) = |x — y| defines a 
metric on R. 


Lemma 7.2.6 /f x, y € R and W is any rational number in R, then |x — y| <W 
means that if x = [(Xn]), y = [On)], and © = [(w, w, ...)], there exists N = N(w) 
such that |x) — yn| < w foralln > N. 


Proof By the above definitions, |x — y| = [(|%, — yn|)] < @means thatw—|x — y| > 
0. Thus we must have that @— |x — y| = [(a,)] where (a,) is a Cauchy sequence of 
rationals such that there exists N with a, > 0 for all n > N. But w— |x — y|/= 
|(w — |Xn — ynl)| Since (x,) and (y,) are Cauchy so is w— |x — y|. Given any 
€ > O there exists N; and there exists Nz such that |x, — xm| < €/2 and |yn — ym| < 
€/2 for all n,m > N, and n.m > No. So that for if n,m > max{N,, No}, then 
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lw Xn Ynl (w [Xm Ym || = Xm — Yl | [Xn _ Yall by the inequality estab- 
lished in the proof of the theorem above gives that this is < | (Xm — Xn) + On — Yn) | 
which by the triangle inequality is < «/2+ €/2 =e. Thus (w — |x, — y,|) is a 
Cauchy sequence of rationals and so it does make sense to consider the real num- 
ber given by its equivalence class. But our given inequality would imply that 
|(w — |X, — ynl)| > 0, which in turn by definition means there exists N = N(w) 
such that |x, — ynl| < w for alln > N. 


Step (6): We must show that Q is dense in R. 
Theorem 7.2.7 Q is a dense subset of R. 


Proof To prove this we have to show that any real number is arbitrarily close to 
a rational number. Here, we must be careful about what we mean by a rational 
number and what we mean by arbitrarily close. A rational number is an equivalence 
class of a sequence where the elements of the sequence are all just one and the 
same rational number (or at least are eventually that). When we say this rational is 
arbitrarily close to a real number, we mean that we can make the distance between 
these two real numbers as defined above less than any preassigned positive rational. 
(Also as defined above.) So let r = [(q1, 92, 93, ..-)] € R and suppose that w > 0 
is any (small) rational number—not an equivalence class yet! We need to show 
that there exists a rational number (here an equivalence class) call it g such that 
lr —q| <W where W = [(..., w, w, w,...)]. (Note that & > 0 by our definition of 
the ordering on R.) Since (q,) is a Cauchy sequence there exists an N = N(w) such 
that |gdn — dm| < w for alln,m > N. Choose a particular k with k > N and set g = 
[(.--5 ks Wks Uk, ---)]. This is an equivalence class of a Cauchy sequence of rationals 
and so q¢ € R. By the embedding above, we associate g with the rational number, gx. 
Now |r — gl =I[C..-; |dn — el. \Gnt1 — Gk|,---)]. Now using the above Lemma, 
since forn > N |gn — gx| < w, we have that |r — g| < W. 


Step (7): Finally, we must show that R is complete. Here we show completeness 
by Cauchy sequences but since we have shown that R is an ordered field this is 
equivalent to the LUB property and the nested intervals property. 


Theorem 7.2.8 R is complete. 


Proof To prove this we have to show that any Cauchy sequence of real numbers 
converges to a real number. Let 71, 72,...,1,,... be a Cauchy sequence of reals. 
We show that it has a limit which is a real number. Realize that each r; is itself an 
equivalence class of Cauchy sequences of rationals. For each n choose a rational 


Gn =[(.--5 ns Qn, ---)], that is, rational in the sense of our embedding, such that 


In — nl < 1 where 4 ='[(ecns 1, 1, ...)]. This can be done since Q is dense in R. 


Consider the sequence of rationals (q1, 92, 93,---, Gn, ---). We claim this sequence 
is a Cauchy sequence. Fix € > 0 a (small) positive rational not an equivalence class. 
Choose N € N such that 1/N < €/3. Since (r,) is Cauchy there exists M > Osuch 
that n,m > M => |r, —rp| < €/3 (here €/3 =[(..., €/3, €/3,...)]) andn > N 
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Gn — ml Ss Gn — Tal + tn — nl + tm — Gal 


by the triangle inequality. Thus, we have for n,m > max{M, N}, |dn — Gm| < ©. 
(Hereé = [(..., €, €,...)].) This means that |g, — gm| < eforalln,m > max{M, N}. 
Thus (g,,) is a Cauchy sequence of rationals. So it makes sense to consider the real 
number r = [(g,)] € R. Further, we claim that r, — r. By construction, Gg; > rp. 
But the fact that g, — r = [(qn)] just follows from (g,,) being a Cauchy sequence. 
For we need, |r — Gm| = |[(Gn)] — Gm| to be made small for n, m sufficiently large. 
But Lemma 7.2.6 says this is true if |g, — g¢m| can be made small for n, m suffi- 
ciently large. This is precisely what it means for (g,) to be a Cauchy sequence. 
Thus we have proven lim(q, — r,) = 0 and lim(r — g,) = 0. So that lim(r — 7,,) = 
lim((r — Gn) + Gn — ’n)) = 0. Thus the Cauchy sequencer), 72,...,/n,... has the 
limit r showing that R is complete. 


Having completed the Cauchy completion of Q, we no longer consider R to be 
a set of equivalence classes of Cauchy sequences of rationals. 


7.2.3 The Characterization of R 


Before moving on to the p-adic numbers, we provide a complete algebraic charac- 
terization of the reals. We need one additional property besides completeness. First 
note that an ordered field must have characteristic zero and hence contains a subring 
isomorphic to the integers. 


Definition 7.2.7 An ordered field F is archimedean if for any pair f\, f2 € F with 
f2 > fi > 0 there exists ann € N such that nf; > fo. 


The complete characterization of R is then given by completeness together with 
the archimedean property. 


Theorem 7.2.9 R is a complete archimedean ordered field. Further, any other com- 
plete archimedean ordered field is isomorphic to R. 


7.3 Normed Fields and Cauchy Completions 


The real numbers R are a completion of the rationals Q and are characterized as 
the unique (up to isomorphism) complete archimedean ordered field. The question 
arises as to whether there are other completions of the rationals. The answer is yes but 
they must be, by necessity, non-archimedean, and further are of a very special type. 
Notice that the construction of R from Q used the absolute value prominently and 
Cauchy sequences and denseness were in terms of this distance. For the additional 
completions of R we must define different distance functions on Q. We do this in 
general. 
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Definition 7.3.1 A norm on a field F is a function | |: F — R satisfying 
(1) |x| =O forallx € F, 
(2) |x| = 0 ifand only if x = 0, 
(3) |xy| = |x|ly| for allx, y € F, 
(4) |x + y| < |x| + ly| for all x, y € F (triangle inequality). 
A normed field is a field F with a norm. 


For example, Q and R are normed fields with the usual absolute value. Any 
normed field F is a metric space under d(x, y) = |x — y|. Since a normed field F 
is a metric space the concepts of convergence, Cauchy sequence, completeness and 
denseness of subsets are all defined on F. As before we say that a normed field is 
complete if every Cauchy sequence within the field converges to an element in the 
field, that is within the field F the concepts of Cauchy sequence and convergent 
sequence coincide. 

The basic result is that given any normed field F it can be embedded as a dense 
subset of a complete ordered field F. The complete ordered field obtained in this 
manner is called the Cauchy completion of F. 


Theorem 7.3.1 Given an ordered field F then there exists a complete ordered F for 
which F is a dense subfield. The field F is called the Cauchy completion of F. 


The proof of Theorem 7.3.1 is identical to the proof that IR can be constructed 
from Q. That proof used only the absolute value properties which are the general 
norm properties. To construct F from F we follow exactly the same steps as in 
Section 7.2.3. We let F be the set of Cauchy sequences from F under the equivalence 
relation that two Cauchy sequences are equivalent if their differences go to zero. We 
then show that F is a complete ordered field and that F is a dense subset of F. We 
leave the details to the exercises. 


7.4 The p-Adic Fields 


Considering R as the completion of Q depended upon absolute value as the norm 
on Q. The question arose as to whether Q could be completed in any other way. The 
answer is yes but it requires a completely different norm on the rationals. As we 
saw in Theorem 7.2.9 the reals are characterized as a complete archimedean ordered 
field. Hence, if we are to complete Q relative to a different norm this norm must not 
be non-archimedean. Before describing this new norm (actually infinitely many new 
norms) on Q we discuss some properties of norms in general. 

Since the completion of a normed field depends on Cauchy sequences we consider 
two norms to be equivalent if they give rise to exactly the same Cauchy sequences. 
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Definition 7.4.1. Two norms on a normed field F are equivalent if their induced 
metrics are equivalent. That is | |, is equivalent to | |2 if a sequence is Cauchy with 
respect to one metric if and only if it is Cauchy with respect to the other. 


The next result gives a condition for equivalence of norms. 


Theorem 7.4.1 Two norms | |; and| |2 on anormed field F are equivalent if and 
only if there exists an a > 0 such that 


lo = |x| 
forallx ¢F 


Proof Suppose that |x|2 = |x| for all x ¢ F and suppose that (x,) is a Cauchy 
sequence relative to the first norm. Given € > 0 and N be found for e!/@. Then, 
form,n > N we have |x, — Xml < e!/ so that |Xn — Xml2 < €. Therefore (x,) is a 
Cauchy sequence relative to the second norm and the two norms are equivalent. 

Conversely, suppose the two norms are equivalent. Choose ana € F with|a|,; < 1. 
This is possible since we have a nontrivial norm. Then, let 


_ log(lala) 
log(lali) 


It follows that |a|2 = (|a|,)°. We show this is true for all x € F. We show this for 
|x|; < 1. The other cases follow the same argument. 

Consider the set S = {r = 7,m,n € N; (|x|1)” < |a|i}. Then for any r € S we 
have (|x|,)’" < (ja|1)” so that |=], < 1. But then |=>|p < landso (|x|2)” < ({a|2)” 
and therefore (|x|2)” < |a|2. The same argument with the | |2 replacing | |; shows 
that for the same S we have 


m 
pa = es N; (lx]2)" < Jal2}. 


By taking logarithms, we then must have 


log lai log |al2 


r : 
log |x| log |x|2 
Since the logarithms involved are all negative we then must have 


log|ali __ log |alz 


log|x|; log |x|> 
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because otherwise there would be a rational number between these two values. How- 
ever this equality implies 
_ loglalz _ log |x|2 


~ loglal; log |x|i 


and we have the result. 


On the rational numbers the absolute value is a norm. The next lemma describes 
norms equivalent to absolute value on Q. 


Lemma 7.4.1. On the rational numbers Q with absolute value | |, the function 
IX|q = |x|° is a norm on Q if and only if a < 1. In this case it is equivalent to 
absolute value | |. 


Proof Let |x|, = |x|° with a < 1. We show that this is a norm on Q. The first two 
properties of a norm are direct so we must only show the triangle inequality. 
Consider |(x + y)|q = |x + y|°. Assume that | y| < |x|. Then 


a a a ly| a a ly| 
x + Wla = lx t+ yl* < (xl + ly) == tian < |x| er, 
a ly|* _ a a _ 
<I + Fg) = bal? + yl = bela + Dl. 


Conversely if a > 1 then the triangle inequality is not satisfied. For example 


[1+ 1/% = 2% > 1% +1°, 


The archimedean property and its negation are crucial for our additional comple- 
tions of Q so we make the definitions formal. 


Definition 7.4.2. A norm | | on a field F is archimedean if given x, y € F with 
x #0 there exists an integer n with |nx| > |y|. If a norm is not archimedean it is 
called non-archimedean. 


Non-archimedean norms satisfy a very special version of the triangle inequality. 
Lemma 7.4.2. A norm | | on F is non-archimedean if and only if it satisfies 
|x + y| < max(|x|, |y|). 


The inequality above is called the strong triangle inequality. The induced metric is 
called an ultra-metric and satisfies 


d(x, z) < max(d(x, y), d(x, Z)). 
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We leave the proof to the exercises. However, recall that an ordered field must 
have characteristic 0 and hence contains a copy of the rational integers Z. Non- 
archimedean norms on a field F are also characterized by the norms of the integers. 


Theorem 7.4.2 (1) The norm | | is non-archimedean if and only if |n| < 1 for all 
integers n. 
(1) The norm | | is archimedean if and only if 


sup({|n|;n € Z}) = oo. 


Proof Suppose that | | is non-archimedean. For any norm we have |1| = 1. Now 
we do induction on the natural numbers which we may assume to be in F. Assume 
|k| < 1 andconsider |k + 1|. Then |k + 1| < max{|k|, 1} < 1 so the assertion is true 
for all natural numbers by induction. We have the equality the | — x| = |x| so the 
assertion is true for all integers. 

Conversely, suppose that |x| < 1 for all integers x. We show that |x + y| < 
max {|x|, |y|}. Now we have 


“.(n _ ” n Ack 
e+ yl =l@+y)"1 =| (Zt s (Piatt ‘. 
k=0 


k=0 


But (i) is an integer so 


Ix + yl" < Solely" * < @ + Dmax{lx|, |yl}". 
k=0 


Hence 
Ix + y| < (n+ 1)!/"max{ |x|, |y|}for all n. 


Taking the limit as n — oo gives us the non-archimedean inequality. This completes 
part (1). 

For part (2) it is clear that if | | is archimedean there must be integers with 
arbitrarily large norms. 


7.4.1 The p-Adic Norm 


For each prime p, we now introduce a non-archimedean norm on the rational num- 
bers. Completion of Q with respect to this norm will give us the field of p-adic 
numbers. Since it is non-archimedean this p-adic norm is not equivalent to absolute 
value and hence as a normed field none of the p-adic fields are isomorphic to R. 
Further we will show that for different primes p; and p2 the corresponding p-adic 
norm is not equivalent to the p2-adic norm. 
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Let x = a be a rational number where (m, n) = 1. As we remarked in Chapter 2 
the fundamental theorem of arithmetic implies that x also has a unique prime decom- 
position 

el ex 
x= P\ eee Py 


where here the exponents e; are allowed to be negative. Now let p be a fixed prime 
and x € Q. Then it follows from the prime decomposition that 
“(5) 
x = p°(= 
PD 
with integers a, b such that (a, b) = 1, p { ab anda € Z. We now define the p-adic 
norm of the rational number x by 


lx|p = p “ifx AO and Oif x =0. 
The map ord : Q > Z by ord(x) = ais called the p-adic valuation. 


Lemma 7.4.3 For any prime p, the p-adic norm is a non-archimedean norm on Q. 
Further | |, can take on only a discrete set of values. 


Proof The basic norm properties are straightforward computations and we leave 
them to the exercises. From the definition the p-adic norm for any rational is p~” for 
some integer m. Therefore, the p-adic norm can take on only a discrete set of values. 
Finally, for any integer n it is clear that the p-adic norm is | or less. Therefore, this 
norm must be non-archimedean. 


Since for any prime p the p-adic norm is a norm hence it defines a p-adic distance 
function on Q given by 


d(x, y) = |x — yp. 


Further since the norm is non-archimedean it follows that the p-adic distance function 
is an ultra-metric and satisfies 


d(x, z) < max(d(x, y), d(x, z)). 


The p-adic norm for any natural number n is less than or equal to one. On the 
other hand if nm > 1 we have for the ordinary absolute value |n| > 1. It follows 
that |n|, 4 |n|° for any real number a and hence for no prime is the p-adic norm 
equivalent to the standard absolute value. 


Lemma 7.4.4 For each prime p the corresponding p-adic norm on Q is not equiv- 
alent to the standard absolute value on Q. 


Next, we show that for distinct primes p; and p2 the corresponding norms are 
inequivalent. 
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Lemma 7.4.5 /f p1, p2 are distinct primes then the corresponding p-adic norms 
are inequivalent. 


Proof Suppose that pj ¢ p2. Let x, = +)". In the p-adic norm this goes to zero 
and hence (x,,) converges and is therefore a Cauchy sequence. However in the po- 
adic norm the sequence (x,,) goes to infinity and hence diverges and is therefore not 
a Cauchy sequence. It follows that the two norms are not equivalent. 


Finally, we show that being close in the p-adic norm is equivalent to being con- 
gruent modulo p”. That is: 


Lemma 7.4.6 Ifa, b € N thena = b mod p" if and only if |a — b|p < p™" 


Proof Suppose that a,b € N and a=b mod p”. Then p”|(a —b). It follows 
that ord(a — b) = n and hence |a — b|, < p~". Conversely if |a — b|, < p™” then 
p"|(a — b) and hence a = b mod p”. 


7.5 The Construction of Q, 


For each prime p, the rational numbers equipped with the p-adic norm provides 
a non-archimedean ordered field. Using the Cauchy completion procedure we can 
construct a complete ordered field that has the rationals as a dense subset, with 
respect to the induced p-adic distance. For a given prime p this is the field of 
p-adic numbers that we will denote by Q,,. Each of these fields is non-archimedean 
and hence non-isomorphic to the real numbers R. Further from Lemma 7.4.5, for 
differing primes p,, p2 the corresponding norms are inequivalent and therefore the 
corresponding fields are distinct as ordered fields. We therefore have the following 
theorem. 


Theorem 7.5.1 For each prime p, the field Q, of p-adic numbers is a complete non- 
Archimedean ordered field which contains the rational numbers Q as a dense subset. 
Further each of these fields is distinct from the real numbers R and for different 
primes Pp, p2 the fields are distinct. 


In Section 7.7, we will prove a type of converse to this result (Ostrowski’s theorem) 
and show that R and the Q, are the only complete ordered extensions of Q that have 
Qas a dense subfield. In Section 7.8, we use a property of the p-adic fields to prove 
that IR is not isomorphic (as fields) to any Q, and if p;, p2 are distinct primes then 
Q >, and Q,, are non-isomorphic. 


7.5.1 p-Adic Arithmetic and p-Adic Expansions 


As we remarked at the beginning of this chapter, the common way to handle real 
number arithmetic is via decimal expansions. As we pointed out though any base 
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can be utilized and for computer hardware purposes usually some form of binary 
expansion is used. In a similar manner, given a fixed prime p each p-adic number has 
a unique p-adic expansion which allows arithmetic to be carried out. This expansion 
uses p-adic digits, that is the numbers 0, 1, ..., » — 1 and arithmetic on the digits 
must be done modulo p. In real arithmetic decimal expansions, there is always the 
ambiguity with 9 and 0, that is, for example, .399999.-- and .40000--- define the 
same number. Because of the uniqueness of the p-adic expansions this ambiguity does 
not occur and often p-adic representations are preferable for computer arithmetic. 


Theorem 7.5.2 Let p be a fixed prime and Q, the field of p-adic numbers. Then 
each p-adic number x € Q, has a canonical p-adic expansion 


CO 
x= be dnp" 


n=—m 
with d; € {0,1,..., p — 1}. This expansion is unique. 


Proof Let p bea fixed prime and Q, the corresponding p-adic field. To start, consider 
rational numbers x with |x|, < 1. We show that for any i € Z there exists a rational 
integer a with |a — x|,) < p and further we can take a € {0,1,2,..., p— 1}. In 
this range the rational integer a is unique. 

To see this consider x = ¢ with a, b integers such that (a, b) = | and let i € Z. 
Since |x|, < 1 we must have that both a, b are relatively prime to p and hence also 
relatively prime to p’. Hence there exists m,n € Z with mb+np' = 1. Now let 
a = am. It follows that 


a a - 
|a —x|p = lam ple = lp lelmb Ip < |mb—1|p =|np'|p S PD. 


Recall that the strong triangle inequality holds for the p-adic norm. That is, |a + 
b|» < max{|a|,, |b|,}. Hence in the inequality given above, we can add a multiple 


of p' to a to get an integer a* in the range {0, 1,..., p — 1} for which 
la*—x|, <p. 


There is only one such integer in this range congruent modulo p' — 1 giving the 
uniqueness. 

Next recall that if a € Q, then a is given by an equivalence class of Cauchy 
sequences (under the p-adic norm) of rationals. We claim that if |a|, < 1 then there 
is a unique such Cauchy sequence (a), dz, ...) representing a with 


a; € Zand a; = a;4; mod pi. 


Let (b;) be a Cauchy sequence of rationals representing a. We show that there 
is an equivalent Cauchy sequence (a;) satisfying the conditions above and which 
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is unique. Since |b;|, — |a|, < 1 as i — oo we may assume, after throwing away 
some initial terms if necessary, that |b;|, < 1 for alli. 

Now for each j = 1, 2, ... let N; be a positive integer such that |b; — b;|, < p~/ 
for all i, i’ > N;. We may assume that the sequence N; is increasing so that N; > j 
for all j. From the first part of the proof there are then integers a; with 0 < a; < p/ 
with 

1 
ey) Se 
p! 


Now consider the sequence (a;). For j € N then for i > N; we have 
la; — bilp = |G; —aj+a;j — by, + by, = Dilp- 


From the strong triangle inequality we obtain 


i 
la; — bilp < max{|a; — aj|p, |aj — bn; |p, bi — bn; |p} S eT ru j= 


It follows that 
la; — bilp > 0 


and hence (a;) is also a Cauchy sequence in the p-adic norm and also represents a. 
Further 


laj+1 — Gjlp = lajai — bn,4, + bn,,, — On, + bn, — ajlp- 
Again using the strong triangle inequality 
laj+1 — 4j|p < max{laj41 — Dnjailor lPNja — by, \p, laj — bn; \p} 


1 1 1 
< max{—, —, —}=—. 
Dp! p! p! p! 


Therefore it follows that a; = aj;+; mod p’. 

We show that the sequence (a;) is cone, SUEROS that (a}) is another Cauchy 
sequence eepreenuns a and satisfying a; = a;,, mod p'. Suppose that for some j 
we have aj # a’ ; Then for any i > j we must have 


a; =a; # a, =a; mod p’. 


This then implies that 


1 
la; — a;|' > — foralli > j, 
p! 
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contradicting the fact that (a;) and (a) are equivalent Cauchy sequences both rep- 
resenting a. 

It is from this unique sequence (a; ) that we construct the canonical p-adic expan- 
sion. Assume first as before that |a|, < 1. Each a; € Z withO <a < p! and hence 
each a; has a p-ary expansion 


aj =dyo+dipt+---+d-1p'' 


with each d; € {0,1,..., p — 1}. Since a; = a;4, mod pi. It follows that 


Gj =a tdpt+---+d_ ip) '+dp' 


with d; € {0, 1,..., p — 1}. From this it follows that a is represented by an infinite 
series 
oe) 
a= Say 
i=0 


which converges in the p-adic norm to the sum a. Thus, a can be considered as 
sequence of p-adic digits which extends infinitely far to the left 


@ asia d.(39 dy, 


This sequence uniquely represents a and is called the canonical p-adic expansion. 

Up to now, we have considered p-adic norms less than or equal to 1. Now suppose 
that |a|, > 1. If jal) = p” then a = pa’ with |a’|, < 1. It follows that a is a 
convergent series of p-adic digits d; € {0,1,..., p — 1} of the form 


fo) 
a= > ap 


i=—m 


with d,, 4 0. Thus we can represent a as a sequence of p-adic digits with a point 
and infinitely many digits to the left of the point and finitely many digits to the right. 
That is, 

a=": -Andn—1 ws -dy.d_\d_z O22 d_m. 


For any a € Q,, there is a unique such expansion and this is called the canonical 
p-adic expansion. 


We will call the dot after dy, because of familiarity, the decimal point in the 
expansion, although this of course has nothing to do with the standard decimal point. 
Notice further that in order to do real number arithmetic there is also ambiguity in 
the expansion no matter what base is used. In decimal arithmetic for example, that is 
base 10, we have 4.000000... and 3.999999 ... representing the same number. In 
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p-adic arithmetic there is no such ambiguity. This can often be used advantageously 
in doing rational arithmetic on a computer. 

In order to do arithmetic in Q, we first need to discuss how to find the p-adic 
expansions, especially for rational numbers. We start with representations of the 
integers. Notice first that multiplying by p” with n > 0 moves the decimal point n 
places to the right while multiplying by p~” moves it n places to the left. So for 
example in Qs we have 


23434134. x 5? = 2343413400. 


while 
23434134. x 5-7 = 234341.34 


Now, let n € N. Then as we showed in Section 7.1, n has a p-ary expansion 
n=aytapt---+a_ip + ap* 


with a; € {0, 1,..., p — 1}. The p-adic expansion of n is then this p-ary expansion 
in the reverse order 
N=-+++Apdg_1 ++: ao. 


Lemma 7.5.1 Consider n € N. Then it has the p-ary expansion 
n=ayt-+-+ ap! 


Then the p-adic expansion is n = ag ---do., the p-ary expansion with the digits in 
the reverse order. 


EXAMPLE 7.5.1 Find the 5-adic expansion of 17. 

Here we first find 17 = 2+ 3 - 5. Hence the 5-adic expansion of 17 is 32. 

p-adic arithmetic is then done much as standard decimal arithmetic but “carries” 
must be done modulo p and taken to the left. We will discuss this more later but we 
show how this is done in Qs. 

EXAMPLE 7.5.2 Show that 4 x 17 = 68 using the 5-adic expansion. 

We have the 5-adic expansions 45 = 4. and 175 = 32. Then 4 x 17 is given by 


4. x 32. = 233. 


To see how we obtain this notice that 4 x 2 = 8 and 8 has the 5-adic expansion 13. 

Therefore the first digit is 3 and we carry the | to the left. Then 4 x 3 + 1 = 13 which 

has the 5-adic expansion 23. Therefore the 5-adic expansion 233. is the final result. 

A quick computation shows that this 5-adic expansion has the value 68 as expected. 
We next consider the representations of negative numbers. 
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Lemma 7.5.2 /f 
oo 


then 


0° 
=Q4 = dip! 


where by, = p — a, and bj = (p — 1) — a; ifi > n. 


We defer the proof to the exercises but here instead present an example which 
exhibits the method. 

EXAMPLE 7.5.3 Find the 5-adic expansion of —3. Suppose that —3 = --- a, 
An—1 +++ dg. Then: - + a,dy_1 +++ do +3. = 0. Working from the left we then have that 
ay + 3 = 0mod 5 and hence ap = 2. It follows that —3 = ---a,---a, - 2. Adding 3. 
to this we get that 3 + 2 = 5 which has the 5-adic expansion 10, so we carry the 1 
to get a; + 1 = 0. Hence a; = 4 since arithmetic on the digits is done modulo 5. 
Continuing in this manner we obtain 


3 =...44442. 


Now, we consider the p-adic representation of rational numbers. We use the 
following lemma which is essentially p-adic division by an integer and then give an 
example. 


Lemma 7.5.3 The fractions ¢ have a periodic p-adic expansion. Suppose that 
c\ 
a= p"(— 
Pp ( Zi ) 
with c,, d, integers such that (c,, d,) = land p ¢ c,d), and suppose that a = a, p" + 
Gn4ip"t! +... +... Then 


anh = cid, mod p. 


Consider then 


Then an41 = cod; ' mod p and so on. 


EXAMPLE 7.5.4 We find the 7-adic expansion of :. 
The easiest way to proceed is to find the 7-adic expansion of ; and then multiply 
(using 7-adic multiplication) by 3. Suppose that 


1 
=—=..-b,b,_1---bibo. 
x 4 1 190 
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Then 4x = 1 so 4b = 1 mod 7. It follows that bp = 2 so that 
§ S<+s Bday se Dy +2, 
Multiply this by 4 to obtain 
4x =--- (4b, + 1)-1. 


To see this 4.2 = 8 = 7+ 1 so we carry the | to get in the second digit 4b; + 1. 
Since the result is 1 = --- 0001. we then have 


4b; +1=Omod7 => b) =5. 


Therefore 
x= by 5-2. 


Continuing in this manner we get 
1 
— =--- 15152. 
4 
Now, we multiply this by 3 to get the 7-adic expansion of 3: 


3 
—=--- 15151516. 
4 


To see this we start the multiplication, 3 x --- 151512. at the far right to first get 6. 
There is nothing to carry. Then 3 x 5 = 15 = 2 x 7+ 1. Hence we write down the 
1 and carry the 2. Then we have (3)(1) + 2 = 5 and continue to the left. 

From the method and from the example, it is clear that if this is done for any 
rational number the resulting p-adic expansion must eventually be periodic. The 
proof is essentially the same as showing the decimal expansion for any rational must 
eventually be periodic. We leave the proof to the exercises. 


Corollary 7.5.1 Let p be a fixed prime and x € Q,. Then, the p-adic expansion for 
x is periodic if and only if x € Q. 


For a fixed p the arithmetic in Q, can be done as in decimal arithmetic but the 
carries must be done mod p and to the left. 

EXAMPLE 7.5.5 Let x = ---45213. and y =---61115. in Q,, Find x+y. 
Using carrying mod 7 we get 


x+y =---36331. 
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To see this, we start at the far right and add 3 + 5 = 8 which has the 7-adic represen- 
tation 11. Hence we write the | and carry the | to the left. Then, we have 1 + 1 = 2 
plus the carry | to get 3. We then continue to the left. 


7.6 The p-Adic Integers 


If p is a fixed prime, then the p-adic norm is non-archimedean and hence the norm 
of any rational integer x € Z is less than or equal to one. It follows that the p-adic 
expansion of any rational integer extends only to the left of the decimal point. We 
extend this to form the ring of p-adic integers which is a subring of the field of p-adic 
numbers and contains the rational integers. It is a unique factorization domain like 
Z but has many properties quite different than the ordinary rational integers. 


Definition 7.6.1 A p-adic number a € Q, is a p-adic integer if its p-adic norm is 
less than or equal to 1, \a|p < 1. We denote the set of p-adic integers by Z, and 
hence 

Lp ={ae Qy; |a|p < 1}. 


Note that Z, also denotes the modular ring Z/pZ. For the remainder of this 
chapter Z, will denote the ring of p-adic integers and we will use Z/pZ for the 
modular ring mod p. 

Since the p-adic norm of a p-adic integer is less than or equal to | it follows that 
in the p-adic expansion of a p-adic integer the digits (possibly infinitely many) are 
always to the left of the decimal point. This can be taken as an alternative definition 
of a p-adic integer. 


Lemma 7.6.1 A p-adic number a € Q, is a p-adic integer if and only if its canon- 
ical expansion has only positive powers of p. That is 


(oe) 


Zp ={ae Qy; a= > ap’. 


i=0 
The p-adic integers form a subring of Q, which contains Z. 


Theorem 7.6.1 The set Z,, of p-adic integers forms a subring of Qy which contains 
the rational integers Z. 


Proof It is clear that Z C Z,. To show that Z, is a subring we must show that it is 
closed under addition, multiplication and additive inverses. Since the p-adic norm 
is non-archimedean it satisfies the strong triangle inequality and hence these closure 
properties are straightforward. We leave the details to the exercises. 


Later, we will see that this ring is actually a unique factorization domain. 
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Recall that a unit in a ring R with identity is an element which has a multiplicative 
inverse. In the rational integers Z the only units are +1. The situation is quite different 
in Z, where there are many units and in fact every rational integer m relatively prime 
to p is invertible. 


Theorem 7.6.2 A p-adic integer a € Z, is a unit if and only if a = ...a,a,a9 with 
ag # 0. Hence the group of units 

[o.@) 

U(Zp) = {> aip's ao # 0}. 

i=0 

Proof Let a € Z, and suppose that 
**Ayn°*:a\dao. 

is the p-adic expansion for a. Consider 3 € Z, with 


B=--+b,-+-dybo. 


Now consider the equation 
Ba=1=---001. 


Since dy 4 0 mod p we can solve for the expansion of (3 in the above equation. First 
we would have ajbp = | where ao and bo are p-adic digits. Since ag is not equal to 
0 it has an inverse mod p and thus a solution for bp. Thus, we have found the first 
digit of 3. Now we multiply again (see Section 7.3) to get 


aob, + ajbo + carry = 0 mod p. 


This is now solvable for b; and we obtain the second digit of @. Continuing in this 
manner we can solve Ga = | and hence a is a unit. 


The next result shows that any element of Q, is a product of an invertible p-adic 
integer and a power of p. 


Lemma 7.6.2 Let x € Q, with |x|) = p-". Then x = p"u with u € U(Z)). 


Proof Let x € Q, with |x|, = p~". Then p~"x = u has norm | since |p~"x|) = 
IP "|p|xlp = p”p ” = 1. Since |u|, = 1 it follows that u is a p-adic integer and 
further since its norm is exactly | from the previous theorem it is invertible and hence 


a p-adic unit. 


Recall that any integral domain can be embedded into its field of fractions, which 
is the smallest field containing it. The field of fractions for Z is of course Q. This 
last lemma shows that Q, is actually the field of fractions for Z,. 
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Theorem 7.6.3. The field of p-adic numbers is the field of fractions for the ring of 
p-adic integers. 


This last theorem provides an alternative approach to the construction of the p- 
adic numbers. Start with p-ary expansions of integers and complete them to form 
the ring of p-adic integers Z,,. Then take the field of fractions of Z,, to find the field 
of p-adic numbers and show that this is complete. This was the approach followed 
originally by Hensel. We refer to [H] for details. 


7.6.1 Principal Ideals and Unique Factorization 


Although the p-adic integers differ radically from the rational integers in the structure 
of their unit groups here we show that the p-adic integers Zp, like the rational integers 
Z, form a unique factorization domain. 

Recall from Chapter 6 (see Section 6.2) that a unique factorization domain or 
UFD is an integral domain R such that for each r € R either r = 0, r is a unit or r 
has a factorization into primes which is unique up to ordering and unit factors. 

In this more general algebraic language, the Fundamental Theorem of Arithmetic 
states that the rational integers Z form a UFD. Gauss proved that the complex integers 
were also a UFD as well as the ring of polynomials over any field F (see Chapter 6). 

In Chapter 6, we also examined principal ideal domains abbreviated as PID, 
which are integral domains where every ideal is a principal ideal. We showed that 
any principal ideal domain is a UFD. Using the p-adic norm, we show that any ideal 
in the p-adic integers Z, is either (0) or pZ p for some k € N. It follows that Z, 
is a principal ideal domain and therefore a unique factorization domain. Further, Z, 
has a unique maximal ideal. 


Theorem 7.6.4 The ring of p-adic integers Z, is a principal ideal domain. The 
ideals are the principal ideal (0) and p*‘Z,y for all k € NU {0}. The ideal pZy = 
Zp\U (Zp) is the unique maximal ideal. 


Proof Let a € Z, with a = >°**, a; p'. Consider the evaluation map from the p- 
adic integers to the integers modulo p, f : Z, > Z/pZ given by f(a) = do. For 
a prime p the modular integers Z/pZ form a field. The evaluation map is then a 
homomorphism onto a finite field with kernel pZ, and hence pZ, is a maximal 
ideal. We show that it is unique. 

Let J be another proper maximal ideal we show J = pZp. It is clear that pZ, 
contains all the p-adic integers with norm strictly less than 1. Suppose that a € J. 
If |a|,p = 1 then a is a unit in Z, and J = Z,. Therefore, if J is a proper ideal it 
follows that |a|, < 1 and hence a € pZ,. Therefore J C pZ, and by maximality 
J = pZ,. 

From the proof above it follows that if J is any proper ideal in Z, we must 
have I C pZ,. The ideals in pZ, are precisely the principal ideals p"Z, for some 
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natural number n. Therefore, Z, is a principal ideal domain with unique maximal 
ideal pZp. O 


Since a PID must be a unique factorization domain, we have the following corol- 
lary. 


Corollary 7.6.1 The ring of p-adic integers Z, is a unique factorization domain. 


We mention that the development of the p-adic integers as a PID can be generalized 
to what are termed discrete valuation rings. A discrete valuation is an integer 
valuation on a field K, that is a function 


p:K > ZU {oo} 


satisfying the conditions 

(1) py) = p(x) + p(y), 

(2) p(x + y) = min(p(x), p(y)), 

(3) p(x) = 00 iffx = 0. 

A field with a non-trivial discrete valuation is called a discrete valuation field. A 
discrete valuation ring is an integral domain whose field of fractions is a discrete 
valuation field. The p-adic norm defines a discrete valuation and hence Z, is a 
discrete valuation ring. 

It can be proved that for a discrete valuation ring, the discrete valuation makes it 
a principal ideal domain and any irreducible elements generate its unique maximal 
ideal. 


7.6.2 The Completeness of Zp 


Consider a convergent sequence of p-adic integers (x,,) with lim x, = x. Here the 
limit is with respect to the p-adic norm. Since each x, € Zp we have |x,|,) < 1 and 
therefore | lim x,|, < 1 also. It follows that x must also be a p-adic integer and hence 
the limit of any convergent sequence of p-adic integers is a p-adic integer. It follows 
that as a subset of the metric space Q, the set Z,, is closed. It is known that a closed 
subset of a complete metric space is also complete and therefore the p-adic integers 


are complete. We have thus proved. 


Theorem 7.6.5 The p-adic integers Z, are complete as a metric subspace of the 
field of p-adic numbers Q,. 
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7.7. Ostrowski’s Theorem 


We have seen that the field of real numbers is up to isomorphism the only archimedean 
completion of Q. That is, if F is any other complete archimedean ordered field that 
contains Q as a dense subset then F is isomorphic to R. Ostrowki’s theorem, that we 
present in this section says that besides the reals, the only completions of Q are the 
fields of p-adic numbers. 


Theorem 7.7.1 (Ostrowski) Every nontrivial norm | | on Q is equivalent to either 
absolute value | | or a p-adic norm | |, for some prime p. Therefore, the only 
complete fields containing Q are the reals R and the p-adic fields Qy. 


Proof Let| | be anorm on Q. Assume first that its archimedean. Then there exists 
an integer n with |n| > 1. Let no be the least such integer and suppose that |no| = ng. 
We show that |n| = n° for all positive integers n. 

Write n in its mo-expansion so that n = dy + ajng +--+ + asn%. By our assump- 
tion on no we have |a;| < 1 for all i. Therefore |n| < Cn°. Using n™ gives us 
In| < C'/Nn®. Letting N — oo we get that n <n. 

Use the expansion again to get n > n° so therefore n = n®. This then implies 
that if g € Q with g > 0 then |g| = g® and hence the norm is equivalent to absolute 
value. Therefore if the norm is archimedean it is equivalent to absolute value. 

Now suppose the norm is non-archimedean. Then |n| < 1 for all integers n. Let no 
be the least integer for which |no| < 1. Claim first that no is a prime. If not 9 = nyjn2 
with n; <n,n2 <n. From this |n;| = |n2| = 1 and hence |p| = 1 a contradiction. 
Therefore, 19 = p a prime and we claim the norm is equivalent to the p-adic norm. 

If p does not divide n then n =rp-+-s and |s| = 1. But then |rp| < 1 and so 
n—s| < |s|andso|n| = |s| = 1. Thusif p does not divide n we have |n| = 1. Given 
n € Nwehaven = p*mwith(m, p) = 1.Then|n| = | p*||m| = | p|*. If |p| < 1 then 
pl=p c= cr for some a and hence this norm is equivalent to the p-adic norm. 


7.8  Hensel’s Lemma and Applications 


For fixed primes p the p-adic numbers have many applications to ordinary num- 
ber theory especially to solving congruences modulo p. Important in this regard is 
Hensel’s Lemma. First, we define congruence in Q,. 


Definition 7.8.1. a = b mod p" in Q, if |a — b|p < p™”. 


Now, we present Hensel’s Lemma that is a result in modular arithmetic. The 
lemma says that if a polynomial equation has a simple root modulo a prime number 
p, then this root corresponds to a unique root of the same equation modulo any 
higher power of p. This root can be found by iteratively lifting the solution modulo 
successive powers of p and is an analog of Newton’s method. 
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Theorem 7.8.1 (Hensel’s Lemma) Let f (x) = co + c1x +++++¢,x" be a polyno- 
mial in Z,[x] (coefficients are p-adic integers). Let f'(x) be the formal derivative of 
f (x). Suppose Gp € Z, with f (Go) =0 mod p and f'(ay) # 0 mod p. Then, there 
exists a unique p-adic integer a such that f (a) = 0 and a = do mod p. 


As preparation for the proof of Hensel’s lemma we recall Newton’s method for 
solving a non-linear equation f(x) = 0 over the reals where f(x) is a differentiable 
real-valued function. We start with an initial guess x9. This initial guess must be 
sufficiently close to a solution for this method to work but we will ignore this here 
and refer to [A] for the technical requirements. Given xo we form the tangent line to 
the curve y = f(x) at the point (xo, f(xo)). This has the equation 


y — f (xo) = f’(%o) (x — x0). 
Let x; be where the tangent line crosses the x-axis, that is where y = 0. We then 


have 
f (Xo) 
f' (x0) 


—f (x0) = f'(%0)(%1 — Xo) => x1 = X0- 


assuming that f’(xo) 4 0. This provides the initial step in an iteration scheme. Con- 
sider the tangent line at (x;, f(x,)) and obtain 


x2 =X, —- a assuming f’(x|) #0 
and in general 
Xai = Xn — Ln) assuming f'(x,) 4 0. 
f'n) 


Under appropriate conditions (see [A]) this iteration scheme will converge to a solu- 
tion of f(x) = 0. How close the initial guess must be to a solution for the method 
to converge depends on the function f(x) (see [A]). 

This method can be applied to polynomial equations P(x) = 0 over the reals. The 
proof of Hensel’s lemma in the p-adic field Q, utilizes a p-adic version of Newton’s 
technique. 


Proof Let f (x) bean p-adic integral polynomial, that is, f(x) has p-adic coefficients, 
and let do be as in the statement of Hensel’s lemma. We will prove the existence of 
a solution a by inductively constructing its canonical p-adic expansion 


a=dytdipt+-.-+dp +: 
where d; are p-adic digits to be determined. Let a; be the k-th convergent for a, 


ay = dg +d, +--+ ap". 
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We will use an induction and a p-adic version of Newton’s method to show that 
we can find p-adic digits so that f (a,) = 0 mod p**! and a, = dp mod p. Then as 
a, — awe have a as the desired solution. 

Let @ have the canonical p-adic expansion 


dy = bo +bipt+--+ hp’ +--+ 


Take dp = dy = bo. Then dy = Gp mod p and f (ag) = 0 mod p. This establishes the 
lowest level of an induction. 
Now, suppose we have a,_; satisfying f(a,—1) = 0 mod pk and a,_| = dy) mod 
p. Now let 
ay = 4-1 + a&kp* 


where d; is a p-adic digit to be determined. Then 


Fx) = fe + dep") = >) (aes + ey’. 
i=0 


Then 


f(a) = co + ~ ci(a,_, + i(a,'dy p“ + terms in powers higher than p‘*')). 


i=1 


This implies that 
F (ax) = f (ani) + de Pf (a1). 


By the inductive hypothesis we have f (a,_;) = 0 mod p* and hence there is a p-adic 
digit e, with 
Ff (ax) = exp + dy p* f' (ae_1). 


To obtain the appropriate digit d, we must then have 
ex + dy f' (ax_1) = 0 mod p. 


Since ag_; = Gp mod p we have f’(ay_1) = f'(@o) # 0 mod p. Therefore, the digit 
d;, can be found by 
ek 
dk = —————— mod 
f'(ax-1) e 


and hence f (a,) = 0 mod p. Notice that approximating the p-adic digits uses essen- 
tially the same iteration scheme as Newton’s method over the reals. 
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Now consider 
a=dyt+dp+---+dkp\+--- 


Since f(a) = f(a.) mod pe for all k we must have f(a) = 0. 
Now assume that a,_; has the desired properties and consider a;,. Let d, be a 
p-adic digit to be determined and consider 


ay = ay t+ dk p*. 


The uniqueness of a follows from the uniqueness of the sequence of conver- 
gents ax. 


The proof of Hensel’s lemma provides an algorithm for constructing the solution 
to an equation f(x) = O with f(x) € Z,[x]. This algorithm is analogous to Newton’s 
Method for solving real polynomial equations. 

Suppose Gp is a solution to f(x) = 0 mod p. Then follow the procedure outlined 
in the proof. Take dp the first p-adic digit of dp and let agp = do. Let a, = ag_1 + dx p* 


and iteratively find the digits d, by dk = Fa) fork > 1. 


Theorem 7.8.2 A polynomial with rational integer coefficients (in Z[x]) has a root 
in Zp if and only if it has an integer root modulo p* for any k > 1. 


Proof Suppose that f(x) € Z[x] and suppose that f(a) = 0 where a € Z,. Then 
from the proof of Hensel’s lemma there exists a sequence of integers (a;) witha, = a 
mod p*. Since f (a,) = f(a) mod p* and f (a) = 0 we must have an integer solution 
mod p* for each k. 

Conversely, suppose that for each k there is an integer a, with f(a,) = 0 mod 
p*. We have seen that the p-adic integers are complete so the sequence a, has a 
convergent subsequence (a,). Suppose that the limit of this subsequence is a. A 
polynomial is a continuous function on any normed field (see exercises) and hence 


f(a) = lim f (ax). 


However f (ax) = 0 mod p* for all k and therefore f(a) = 0 mod p* for all k and 
hence f(a) = 0. 


Corollary 7.8.1 [fa polynomial F (x) with integer coefficients has no roots modulo 
p then it has no roots. 


Hensel’s lemma can be used to describe the roots of unity in Q,. 


Theorem 7.8.3. For any prime p and (m, p) = | there exists a primitive m-th root 
of unity in Q, if and only if m|(p — 1). In this case every m-th root of unity is also a 
(p — 1)-th root of unity. The set of (p — 1)-th roots of unity forms a cyclic subgroup 
of U(Z,») of order p — 1. 
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Proof \f m|(p — 1) then p — 1 = km and hence every m-th root of unity in Q, is 
also a (p — 1)-th root of unity. Consider the polynomial f(x) = x?! — 1. Then its 
formal derivative is f’(x) = (p — 1)x?-2. Now let a be a rational integer with | < 
a < p —1. Then from Fermat’s theorem, we have f(a) = 0 and further f’(a) 4 0 
since | f’(a)|, = 1. Therefore Hensel’s lemma implies that there are exactly p — 1 
solutions to f(x) = 0 and they are all (p — 1)-th roots of unity. 

Conversely suppose that a € Q, with a” = 1 then |a|, = 1 and a is a p-adic 
integer. Let - -- ajag. = a thena = ay mod p and hence aq’ = 1. Since ap is a rational 
integer this implies that m|(p — 1). 

The set of (p — 1)-th roots of unity in Q, is then a finite subgroup of a field and 
as we saw in Theorem 2.4.13 this must be cyclic. 


As we have seen in this book, quadratic residues modulo a prime are important in 
several different areas of number theory. In fact determining quadratic residues was 
crucial in the Rabin encryption system. The final result of this section ties quadratic 
residues modulo a prime p to square roots in the p-adic integers. 


Lemma 7.8.1 A rational integer a not divisible by p has a square root in Zp (p # 2) 
if and only if a is a quadratic residue modulo p. 


Proof Leta € Zwith(a, p) = 1. Consider the polynomial P(x) = x? — ain Z p(x]. 
Suppose that a is a quadratic residue mod p. Then there exists Gg with dp € 
{1,2,...,p—l} anda? = a mod p. Further P’(x) = 2x and P’(ay) = 2a) # 0 
mod p since (a, p) = 1. Therefore by Hensel’s lemma P(x) has a solution in Zp. 
Conversely suppose that a is not a quadratic residue. Then P(x) 4 0 mod p and 
hence P(x) 4 0 mod nu for any k. It follows that P(x) can have no solution in Zp. 


7.8.1 The Non-isomorphism of the p-Adic Fields 


Since each p-adic field is non-archimedean we have seen from the characterization 
of R that for any prime p the p-adic field Q, is not isomorphic to R. In the next 
theorem we use the results on square roots in Q, to provide another proof of this and 
to show that p-adic fields for different primes are non-isomorphic. 


Theorem 7.8.4 The p-adic field Q, is not isomorphic to R for any prime p. Fur- 
ther, if py and po are distinct primes then the corresponding p-adic fields are non- 
isomorphic. 


Proof Let p be a prime and suppose that f : R — Q, is an isomorphism. Then p 
has a square root in R and hence by the isomorphism f (p) has a square root in Q,. 
However, p is not a quadratic residue mod p and therefore p has no square root in 
Q, providing a contradiction. 

pi-l 


If pi # p2 then there are “;— quadratic residues mod p, and beet quadratic 


residues mod p32. It follows that if p2 > p; there must exist an integer a which is 
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a quadratic residue mod p2 but not mod p;. Use this integer a and then follow the 
same proof as above. We leave the details to the exercises. O 


As a final application of both Hensel’s lemma and the utility of the p-adic fields in 
general we mention without proof the local-global principle of Hasse. The rational 
numbers Q are called a global field while its Completions, the real numbers R and 
the p-adic fields Q,, are called local fields. Any relationship among a set of rational 
numbers which is true globally, that is in Q is also true locally, that is in R and all 
the p-adic fields Q,. 

Hasse’s Global-Local Principle provides a partial converse for equations involv- 
ing quadratic forms with integer coefficients: 


aa, + dix +c= 0. 
i,j i 


If such an equation has solutions in R and in Q, for every prime p, then it has a 
rational solution in Q. In other words, a quadratic equation with integer coefficients 
has a global solution, that is in Q if and only if it has solutions in all the local fields, 
that is in R and in Q, for all p. 


7.9 Exercises 


7.1 Find the p-adic norm and p-adic expansion in Q; of: 
(a) 15 
(b) —1 
(c) —3 
(d)} 


7.2 Describe in detail, analogously as for R, the Cauchy completion of the rational 
numbers Q equipped with the p-adic norm for a prime p. 

7.3 Fill in the details of the proof of Theorem 7.8.4, that is if p; 4 p2 then the 
p-adic fields Q,,, and Q,, are not isomorphic. 

7.4 Let p be a prime number and Z, the p-adic integers. Show that Z,,/p" Zp is 
isomorphic to Z/p"Z for any n > 0. 

7.5 Let p be a prime number and Z, the p-adic integers. Show that the additive 
group of Z, is torsion-free. 

7.6 Use the algorithm in the proof of Hensel’s Lemma to find a solution (if there 
exists one) of the polynomial equations: 

(a) x? — 3x7 +2x+1=0inQ, 

(b) x*—6in Qn 

7.7 Complete the proof that a p-adic expansion for x is periodic if and only if x 
is rational. 

7.8 Show that if x € Q, and x = 0 mod p* for allk > 1 then x = 0. 
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7.9 Let f(x) € Q,[x] that is a polynomial with p-adic coefficients. Show that 
f (x) is a continuous function of Q,. 

7.10 Complete the proof of Theorem 7.8.4 and show that if p;, p2 are distinct 
primes then the corresponding p-adic fields are non-isomorphic. 

7.11 Prove that the rationals Q are dense in Q,. 

7.12 Prove that the p-adic integers Z, are compact as a metric space using the 
p-adic norm. 

7.13 Show that for any prime p and any positive integer m not divisible by p, 
there exists a primitive m-th root of unity in Q, if and only if m divides p — 1. 

7.14 Show that the set of roots of unity in Q, is a subgroup of the group of p-adic 
units. 

7.15 Prove that a rational number x € Q is a square if and only if it is a square in 
every p-adic field Q, and in the real numbers R. 

7.16 Let Z2 be the 2-adic integers, Show that if b € Z) and b = 1 mod 8 then b 
is a square in Zp. 

7.17 Show that the equation (x? — 2)(x? — 17)(x? — 34) = 0 has a solution in 
the real numbers R and in all the p-adic field Q, with p prime, but has no solution 
in the rational numbers Q. 
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